Closed Andreas-Bio closed 2 months ago
I'm not sure of the issue offhand. Could you include the log file that was generated and the version number or commit hash number if you pulled from source? If you have paired end data you could try that too, most users use paired end data so that has been better field tested.
The .log does not show anything suspicious. ITSxpress.log
It works with other samples, so it is not a general issue. I haven' t found out yet why sometimes it failes and sometimes the results are perfect. I checked the sequences and the labels, they seem to be okay. My version is 1.8.0 installed via conda. The .hmm have the same byte size as the .hmm files in the ITSx directory.
It may be a parsing issue. I see you linked combined_seq_412.fastq.gz
I can run it and take a look tomorrow.
Okay, I ran it and looked at the results. ITSxpress is designed to trim complete ITS regions (ITS1, ITS2, or the complete ITS region containing ITS1 the 5.8s and ITS2) It requires that the beginning and the end be present to output the ITS. HMMER looks for the edges of the ITS2 in the 5.8S or the LSU by looking by 20-30 BP to identify the edges. In this library, no ends of the ITS2 at the junction of the LSU were detected.
So this isn't really an error but a result of the reads being too short to detect an end to the ITS2. I would guess ITSx has a mode that only trims the front of the ITS reads. The main use case for ITSxpress has been as input for calling amplicon sequence variants with Dada2. That requires that the full amplicon be present for accurate variant calling.
Thank you for your time! If it was designed this way ITSxpress is not compatible with all primers. There are no LSU edges detected because the primers from https://www.nature.com/articles/s41598-018-26648-2 are very close to ITS2, leaving no LSU overhang after primer removal. (Which is commonly done as the first step in amplicon studies.) I believe this should be communicated more clearly, as it is definetly a very important difference from ITSx. An easy fix would be to add the primers back after quality filtering, giving them a perfect score (as they are removed by ITSxpress anyway). I am sorry I can not be more positive, your support was outstanding.
Interesting. There's no better way to learn about all the unique ways that universal primer sets are configured to write a tool to trim them. I'll think about how to support this. I could add a flag to ignore end trimming with a warning not to do it normally. The reads in your example do not contain the primer sequence and the sequence is degenerate, so stitching it back on seems pretty tricky. I'd have to think about how to handle it on primer sets with '--reversed-primers'. Do you have input on the change?
If I understood that right ITSxpress is only extracting whole ITS regions. I am not sure how you enforce that rule, but ITSx has a flag called --only_full {T or F}
that could be used for that. Maybe it would be possible to implement this flag in ITSxpress with minimal effort, becuse the two tools seem to be very similar? Then if the flag is TRUE per default, the behaviour of ITSxpress would not change for long-term users. On the other hand, if the primer is too close to ITS2 or if you sequence a very noisy amplicon (high GC) and partial sequences are desired the user could turn the flag to FALSE.
I ran this today and confirmed that ITSxpress v2.1.1 returns the same results. At this point I don't have plans to add support for primer sets that partially overlap the ITS region and partially overlap the conserved SSU or LSU. I'll. try to monitor the need though and may put it on the roadmap in the future.
combined_seq_412.fastq.gz
/home/ubuntu/miniconda3/bin/itsxpress --fastq /home/ubuntu/combined_seq_412.fastq.gz --single_end --outfile /home/ubuntu/ITSxpress/combined_seq_ITS2_T_412.fastq.gz --region ITS2 --taxa Tracheophyta --cluster_id 1 --threads 10
produces this result: combined_seq_ITS2_T_412.fastq.gzbut ITSx produces this result:
/home/ubuntu/ITSxpress/ITSx_1.1.2/ITSx -i /home/ubuntu/combined_seq_412.fasta -o /home/ubuntu/ITSxpress/412 --save_regions ITS2 --minlen 60 --not_found F --graphical F --cpu 28 --complement F -t T --reset T
ITSx412.tar.gz