USDA-ARS-GBRU / itsxpress

Software to trim the ITS region of FASTQ sequences for amplicon sequencing analysis
Other
9 stars 9 forks source link

Improper handling when no fungal sequences are found in a processed file #8

Closed arivers closed 5 years ago

arivers commented 5 years ago

https://forum.qiime2.org/t/trimming-fungal-demultiplexed-paired-end-sequences-before-dada2/6644/11

arivers commented 5 years ago

This bug revealed several issues.

  1. An issue with parsing of HMMER domtable files. The score for the whole hmm profile was being used not the score for exact alignment of the profile. In some cases a profile aligned twice and itsxpress would just take the first profile rather than taking the highest scoring alignment.

  2. There was a second error in how the highest scoring segment was being selected. The score was being treated as a string not a floating point value so it would occasionally select a lower scoring segment which sometimes had the wrong position (e.g. the string '6.3' < '20.1' while the float 6.2 < 20.1). The Qiime2 authors were wise to require static typing...

  3. The error that Qiime was producing about "line 17" is actually the result of itsxpress writing empty sequences because it was using bad position information. A check was added to not write sequences where the length was less than 1 even though that should not happen any longer.

arivers commented 5 years ago

Closed with the release of v1.7.2

thermokarst commented 5 years ago

Awesome, @arivers! BTW, that "line 17 business" was specifically referring to line 17 of that user's fastq.gz file. That validator tries to point out the exact location it ran into problems (YMMV). Thanks!

arivers commented 5 years ago

Thanks @thermokarst. I think that both the qiime users omera.WIS and einamart were getting the line 17 error using different files. That's why I thought is was characteristic of the error.