Closed aersoares81 closed 2 years ago
Do we need fasta from the bam file? nf-core has a module for bam to fastq https://github.com/nf-core/modules/tree/master/modules/samtools/fastq which I've used in the genome properties workflow.
I'm afraid if we keep PacBio reads as fastq it might cause problem downstream as every base will probably get flagged as "!" (low quality), since PacBio doesn't computer phred-like score AFAIK.
Which tools are using the quality score downstream?
I know Inspector has a mapping step, and I believe it uses minimap2, but I don't know how minimap takes in consideration quality scores in fastq files. I thought it would be just safer not store information that might confuse any program downstream that we might add, and that is technically incorrect.
Inspector specifically runs minimap2 with ignore mapping quality: https://github.com/ChongLab/Inspector/blob/089a740f7deaef17d7ddb7f352626fb1134d76f0/inspector.py#L94
minimap2 -Q
Yes, but I don't understand why we should keep these reads in fastq format since it does not provide any benefit that I can see, and uses more disk space.
This will get the HiFi reads from the original BAM file that's delivered and convert them into a fasta file for downstream analyses. I will write a test for it now.