BCCDC-PHL / FluViewer

Tool for generating influenza A virus genome sequences from FASTQ data
https://bccdc-phl.github.io/FluViewer/
4 stars 1 forks source link

Coverage drop-outs leading to samples not being compete enough for tree #3

Closed JamesZlosnik closed 11 months ago

JamesZlosnik commented 11 months ago

This issue was previously identified in fall 2022. Mapping of the reads to reference sequences looked like there should be sufficient coverage some samples that failed. Investigation of the .sam file showed these reads mapped, while the .bam file did not contain (in some cases) many of the reads in the .sam file. Investigation showed that this is being caused by the samtools view command. The initial version is:

terminal_command = f'samtools view -f 3 -F 2828 -q 30 -h {sam_out} | samtools sort -o {bam_out}'

https://github.com/BCCDC-PHL/FluViewer/blob/d110f970e849b0ecc27df2e00197a830f7c3aa10/fluviewer/fluviewer.py#L373C7-L373C7

the -f 3 flag indicates:

Switch this to -f 1 to fix which would give: