Closed beatrizsavinhas closed 1 month ago
Previously (in b8ef794ab
), we had:
https://github.com/Clinical-Genomics/cg/blob/b8ef794ab1250c337101f7634d82468b803be009/cg/store/crud/read.py#L394-L408
https://github.com/Clinical-Genomics/cg/blob/b8ef794ab1250c337101f7634d82468b803be009/cg/meta/demultiplex/status_db_storage_functions.py#L127-L138
The filters were not updated when changing to the new database models
Reads from lanes that have low q30 values should not be included to calculate sample reads, So that only reads of good quality are used for running analyses.
Acceptance Criteria
Notes
The current logic for updating the
Sample.reads
, sums all the reads in each lane, regardless of q30 values: https://github.com/Clinical-Genomics/cg/blob/7b1c4d1c81da518d56bc414c8e845fb340721556/cg/store/crud/update.py#L61-L68When it comes to storing fastq files in housekeeper though, only fastq files from lanes that that pass the q30 are stored: https://github.com/Clinical-Genomics/cg/blob/33e9f5e5ad0b2a1a6f0889407f6b259ca062cc5d/cg/services/illumina/post_processing/housekeeper_storage.py#L24-L52
Essentially, we have a count for sample.reads that does not correspond to the reads in the fastq files that we actually use for the analyses.
The previous logic, however, did take low q30 into account - See https://github.com/Clinical-Genomics/cg/issues/2197.
Implementation plan