DaehwanKimLab / hisat2

Graph-based alignment (Hierarchical Graph FM index)
GNU General Public License v3.0
479 stars 119 forks source link

hisat2_read_statistics.py fails to infer FASTQ input #255

Open egaffo opened 4 years ago

egaffo commented 4 years ago

in HISAT2 v2.2.0 the hisat2_read_statistics.py infers the input read file format from the filename extension: after removing possible compression extension, it checks either "fq" of "fastq" and, if not present, it switches to read lines as they were in FASTA format. I think it should get the input format from calling scripts instead of inferring it, since it can cause downstream scripts to fail (f.i. the hisat2-align-s) because of wrong read statistics, as it occurred to me.

Note that some read preprocessing tools, say Trimmomatic, output filenames different from *.fq.gz or alike (f.i. SRR445016_1.fq.P.qtrim.gz) and you may give those files to HISAT2. However,

  1. in the hisat2 help no costraint is specified for read filenames,
  2. hisat default is to read fastq (-q option), so why hisat2_read_statistics.py guesses the file format ? and, most importantly,
  3. no error nor warning is given from hisat2_read_statistics.py, so you end up with a unexplainable "--read-lengths arg must be at least 20" error when the hisat2-align-s executes.

Hope it helps

parkchanhee commented 4 years ago

@egaffo

Thank you for your suggestion. I know there are some bugs and improvements in the script. We will consider your suggestions in future release.