StevenWingett / FastQ-Screen

Detecting contamination in NGS data and multi-species analysis
https://stevenwingett.github.io/FastQ-Screen/
GNU General Public License v3.0
64 stars 15 forks source link

Minimap2 aligner, only support for fa.gz/.fa files instead of mmi index files #74

Closed SergeWielhouwer closed 1 month ago

SergeWielhouwer commented 1 month ago

Hi,

Thank you for developing FastQScreen.

I am currently testing out FastQScreen v0.16.0 using minimap2 as aligner and was wondering why for this aligner choice, no index files (.mmi) but .fa/.fa.gz reference sequences seem to be required.

In the following lines (in fastq_screen executable) image The presence of a valid minimap2 index is only checked if a file ends with .fa.gz.

But later on, also the extension .fa is checked.

image

Could you elaborate why .fa.gz/.fa is only supported? There are many possible extensions for nucleotide fasta data such as .fasta, .fna (also capitalised) and by having a pre-indexed mmi file for minimap2, the software doesn't need to index the reference on-the-fly anymore, which could save a couple of minutes when analysing many samples against larger mammalian/plant reference sequences. So similar to the BWA/Bowtie2 aligner index requirement for this tool.

Best,

Serge

StevenWingett commented 1 month ago

Hi,

Yes, it would be good to allow these extensions and pre-indexed files. When I next get chance to update the software I could look into this. Of course, if you can program in Perl and can match the style of the existing script you are welcome to open a pull request.

Many thanks.