Clinical-Genomics / BALSAMIC

Bioinformatic Analysis pipeLine for SomAtic Mutations In Cancer
https://balsamic.readthedocs.io/
MIT License
44 stars 16 forks source link

Revisit the filtering strategy for the UMI workflow #1149

Closed ivadym closed 10 months ago

ivadym commented 1 year ago

Need

A reduced number of variants are currently being uploaded to Scout for the Balsamic UMI workflow. Therefore, we need to improve the filtering in order to provide customers with a useful list of variants.

Suggested approach

Requests/suggestions/bugs solved by the feature

Can be closed when

ivadym commented 1 year ago

Some of the latest UMI cases with their number of variants:  

Case 1 (downsampled to 40M):
- 53 raw variants
- 0 research variants
- 0 clinical variants
 
Case 2 (downsampled to 60M):
- 161 raw variants
- 20 research variants
- 19 clinical variants
 
Case 3 (downsampled to 80M):
- 287 raw variants
- 85 research variants
- 78 clinical variants
 
Case 4 (PALKTTR040):
- 3407 raw variants
- 167 research variants
- 126 clinical variants
 
Case 5 (PALKTTR040):
- 3132 raw variants
- 108 research variants
- 77 clinical variants
 
Case 6 (PANKTTR080):
- 404 raw variants
- 174 research variants
- 174 clinical variants
 
Case 7 (PANKTTR080):
- 543 raw variants
- 254 research variants
- 254 clinical variants
mathiasbio commented 1 year ago

This might be somewhat affected by the changes in: https://github.com/Clinical-Genomics/BALSAMIC/pull/1176 regarding the updates for the fastp rules, where quality trimming for the UMI-workflow will be removed. Previously quality trimming was done prior to extraction of the UMI sequences which could decrease the sizes of the UMI families, and fewer consensus reads in the final bam. Maybe we can check these numbers again after these fastp-changes 🤔

ivadym commented 10 months ago

Closing as there are already specific requests in: https://github.com/Clinical-Genomics/BALSAMIC/issues/1336