USDA-ARS-GBRU / itsxpress

Software to trim the ITS region of FASTQ sequences for amplicon sequencing analysis
Other
12 stars 9 forks source link

fasta file being passed to vsearch #31

Closed austenapigo closed 1 month ago

austenapigo commented 1 year ago

Hi, I'm using ITSxpress for the first time and I'm running into an error that states "FASTQ input is only allowed with the fastx_uniques command". I'm supplying a FASTQ file, but the sequences are written to a temporary FASTA file by vsearch before dereplication. I'm supplying one .fastq file with reads that have already been merged.

itsxpress --fastq 01_merged_set_1.fastq --single_end --region ITS2 --taxa Fungi --log 02_itsxpress.txt --outfile 01_merged_its2_set_1.fastq
1.0
2022-11-17 13:59:13,847: INFO     Verifying the input sequences.
2022-11-17 14:01:31,655: INFO     Sequences are assumed to be single-end.
2022-11-17 14:01:31,657: INFO     Temporary directory is: /tmp/itsxpress_rc5x9i21
2022-11-17 14:01:31,657: INFO     Unique sequences are being written to a temporary FASTA file with Vsearch.
2022-11-17 14:01:31,675: INFO     vsearch v2.22.1_linux_x86_64, 94.2GB RAM, 12 cores
https://github.com/torognes/vsearch

Fatal error: FASTQ input is only allowed with the fastx_uniques command

2022-11-17 14:01:31,675: ERROR    Could not perform dereplication with Vsearch. Error from Vsearch was:
 vsearch v2.22.1_linux_x86_64, 94.2GB RAM, 12 cores
https://github.com/torognes/vsearch

Fatal error: FASTQ input is only allowed with the fastx_uniques command
Traceback (most recent call last):
  File "/home/austenapigo/miniconda3/lib/python3.9/site-packages/itsxpress/main.py", line 512, in deduplicate
    p2.check_returncode()
  File "/home/austenapigo/miniconda3/lib/python3.9/subprocess.py", line 460, in check_returncode
    raise CalledProcessError(self.returncode, self.args, self.stdout,
subprocess.CalledProcessError: Command '['vsearch', '--derep_fulllength', '01_merged_set_1.fastq', '--output', '/tmp/itsxpress_rc5x9i21/rep.fa', '--uc', '/tmp/itsxpress_rc5x9i21/uc.txt', '--strand', 'both', '--threads', '1']' returned non-zero exit status 1.
2022-11-17 14:01:31,678: ERROR    ITSxpress terminated with errors. See the log file for details.
2022-11-17 14:01:31,679: ERROR    Command '['vsearch', '--derep_fulllength', '01_merged_set_1.fastq', '--output', '/tmp/itsxpress_rc5x9i21/rep.fa', '--uc', '/tmp/itsxpress_rc5x9i21/uc.txt', '--strand', 'both', '--threads', '1']' returned non-zero exit status 1.
seina001 commented 1 year ago

Hi Austen, How did you install ITSxpress? Vsearch should be Version 2.7.0 I believe.

Can you tell me which version of ITSxpress your are using?

Would you be willing to share with me the 01_merged_set_1.fastq file? seinarsson@ufl.edu

It seems like your file isn't being routed correctly through ITSxpress or this may be a Vsearch dependency issue. I'll need to take a closer look.

Thanks,

Sveinn

austenapigo commented 1 year ago

Hi Sveinn,

I installed ITSxpress with conda install itsxpress

I am using Vsearch version 2.22.1, which I believe is the most updated version.

I will email my file to you shortly.

Thanks for looking into this, Austen

seina001 commented 1 year ago

This may be a two part issue. It's running correctly on my end with Vsearch version 2.15.2. The derep command has been modified in the recent versions of Vsearch, so we'll have to adjust for that for version 2 of ITSxpress. Which will be pushed out soon.

For now you can try in your ITSxpress conda environment: conda remove vsearch Conda install -c bioconda vsearch==2.15.2

However, the output indicates that it can't find start and stop sites in any of the sequences.

Can you tell me what filtering you've already done on your file?

austenapigo commented 1 year ago

Thanks, Sveinn I'll give it a try! The reads have been merged with Usearch. The library is mixed in composition with fungi ITS and bacteria 16S. I am hoping to use ITSxpress to separate them because they were pooled with the same set of barcode combinations. I'm finding that bacterial reads are most of the library, so that probably explains why the output can't find ITS.

kek12e commented 1 year ago

@seina001 FYI I just dowloaded itsxpress today using conda and had this same issue. However running conda remove vsearch and then conda install -c bioconda vsearch==2.15.2 from within my itsxpress environment as suggested above did not work because removing vsearch also removes itsxpress (and other things). I got this to work instead by making a new itsxpress environment and specifying the vsearch version when installing: conda install itsxpress vsearch==2.15.2 Additionally, before this vsearch error I kept getting an error with BBmerge and also had to update that within my itsxpress environment using conda install -c bioconda bbmap. This solution I found on the qiime2 forum: https://forum.qiime2.org/t/q2-itsxpress-bbmerge-error/18682

anslan commented 1 year ago

This may be a two part issue. It's running correctly on my end with Vsearch version 2.15.2. The derep command has been modified in the recent versions of Vsearch, so we'll have to adjust for that for version 2 of ITSxpress. Which will be pushed out soon.

... or in main.py, on L504 you could just change "--derep_fulllength" to "--fastx_uniques" and on L506 "--output" to "--fastaout" to be compatible with recent vsearch versions.

Robvh-git commented 12 months ago

I used itsxpress for the first time today and got the same error, but only if the --cluster_id parameter is set to 1.0 (the default). If I set the --cluster_id to 0.99, it works. @seina001 any idea why that may happen? Has to do an older version of vsearch not supporting ASVs?

Solution by @kek12e indeed solves it

arivers commented 1 month ago

Resolved in Version 2.