Closed science-chump closed 7 months ago
What primer set are you using?
On Mon, Nov 22, 2021, 9:44 AM srs1085 @.***> wrote:
Hey all, I am having trouble getting ITSexpress to work on my files. The sequences were amplified using ITS4Fun and 5.8S primers to capture the ITS2 region.
The merging step seems to be working okay because I am generating data on the %merged reads and read lengths and what not, however my sequences seem to be erroring out and I am getting a message that says no ITS start or stop sites detected. This error line repeats for many many lines.
Any insight would be greatly appreciated!
Here is the code I am using:
`conda activate trim_3p
INDIR=/mnt/home/ernakovich/srs1085/DATA/Rhizo_pilot/ITS_dada2/02_filter/preprocessed_F
INDIR2=/mnt/home/ernakovich/srs1085/DATA/Rhizo_pilot/ITS_dada2/02_filter/preprocessed_R OUTDIR=ITSxpress_f OUTDIR2=ITSexpress_r mkdir $OUTDIR mkdir $OUTDIR2
for i in $INDIR/R1 do( FILE=${i## /} BEFFILE=${FILE%R1} AFTFILE=${FILE##*R1} R1=$FILE R2=${BEFFILE}R2${AFTFILE} echo $R1 if [ -f $OUTDIR2/$R2 ] then continue fi
srun ~/.local/bin/itsxpress \ --fastq $INDIR/$R1 --fastq2 $INDIR2/$R2 \ --outfile $OUTDIR/$R1 --outfile2 $OUTDIR2/$R2 \ --region ITS2 --taxa 'Fungi' --cluster_id 1 \ --reversed_primers \ --threads 16 \ --log itsxpress.log
) done `
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/USDA-ARS-GBRU/itsxpress/issues/24, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACZ6VIIJRNWMXD5TSJDLHTUNJJOJANCNFSM5IRFWQWQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
ITS4fun and 5.8S from Taylor et al. 2016
Could you attach the log file?
It is not letting me upload the .txt file but I will post some of it in here. Apologies if there is a better way to format all of this for the forum, I am a newbie.
It starts with this:
11-21 20:20 root INFO Verifying the input sequences.
11-21 20:20 root INFO Sequences are paired-end in two files. They will be merged using BBmerge.
11-21 20:20 root INFO java -ea -Xmx1000m -Xms1000m -Djava.library.path=/mnt/home/ernakovich/srs1085/.conda/envs/trim_3p/opt/bbmap-38.93-0/jni/ -cp /mnt/home/ernakovich/srs1085/.conda/envs/trim_3p/opt/bbmap-38.93-0/current/ jgi.BBMerge in=/mnt/home/ernakovich/srs1085/DATA/Rhizo_pilot/ITS_dada2/02_filter/preprocessed_R/1d_9-8_ITS_S14_L002_R2_001.fastq.gz in2=/mnt/home/ernakovich/srs1085/DATA/Rhizo_pilot/ITS_dada2/02_filter/preprocessed_F/1d_9-8_ITS_S14_L002_R1_001.fastq.gz out=/tmp/itsxpress_7zx0pbm5/seq.fq.gz t=16 maxmismatches=40 maxratio=0.3
Executing jgi.BBMerge [in=/mnt/home/ernakovich/srs1085/DATA/Rhizo_pilot/ITS_dada2/02_filter/preprocessed_R/1d_9-8_ITS_S14_L002_R2_001.fastq.gz, in2=/mnt/home/ernakovich/srs1085/DATA/Rhizo_pilot/ITS_dada2/02_filter/preprocessed_F/1d_9-8_ITS_S14_L002_R1_001.fastq.gz, out=/tmp/itsxpress_7zx0pbm5/seq.fq.gz, t=16, maxmismatches=40, maxratio=0.3]
Version 38.93
Set threads to 16
Writing mergable reads merged.
Started output threads.
Total time: 6.156 seconds.
Pairs: 250261
Joined: 205989 82.310%
Ambiguous: 32945 13.164%
No Solution: 11327 4.526%
Too Short: 0 0.000%
Avg Insert: 340.0
Standard Deviation: 29.0
Mode: 317
Insert range: 104 - 443
90th percentile: 385
75th percentile: 372
50th percentile: 323
25th percentile: 317
10th percentile: 315
11-21 20:20 root INFO Temporary directory is: /tmp/itsxpress_7zx0pbm5
11-21 20:20 root INFO Unique sequences are being written to a temporary FASTA file with Vsearch.
11-21 20:20 root INFO WARNING: The derep_fulllength command does not support multithreading.
Only 1 thread used.
vsearch v2.18.0_linux_x86_64, 995.5GB RAM, 64 cores
https://github.com/torognes/vsearch
Dereplicating file /tmp/itsxpress_7zx0pbm5/seq.fq.gz 100%
70045619 nt in 205989 seqs, min 104, max 443, avg 340
Sorting 100%
73002 unique sequences, avg cluster 2.8, median 1, max 8869
Writing output file 100%
Writing uc file, first part 100%
Writing uc file, second part 100%
11-21 20:20 root INFO Searching for ITS start and stop sites using HMMSearch. This step takes a while.
11-21 20:22 root INFO Parsing HMM results.
11-21 20:22 root INFO Writing out sequences
Next is this line for at least a few hundred lines with the #'s changing
11-21 20:22 root DEBUG No ITS stop or start sites were identified for sequence A01346:32:HFMCNDRXY:2:2101:2139:1000, skipping.
Then ends with this:
11-21 20:23 root INFO ITSxpress ran in 00:03:17
Those lines are just informational not an error. After the merged, de-replicated seed sequence is created, itsxpress searches for the start and stop sites. Sometimes a merged sequence does not have the full sequence due to quality issues. If that seed sequence is missing you will get a warming for all other sequences in the de-replicated cluster. How many reads are you getting out in the end? Is it a reasonable number? Some loss is normal.
The output sequences are all blank after they go through ITS express. No sequences are retained for any of the samples and they are all 1kb in size.
I don't know if this should matter or not but I used cutadapt and dada2 filtering prior to putting the samples through ITSexpress.
That may be the issue. The normal procedure is to remove adapters from your paired-end reads, then run ITSxpress. The output of ITSexpress goes into Dada2. I'm not sure what you mean by using Dada2 first. Dada2 primarily creates the ASV's and an ASV count table.
My mistake, the pipeline I am using is written for dada2 but I am actually juxtaposing ITSxpress in it. The only steps that are occuring before the ITSxpress is removal of primer/adapters with cutadapt and removal of sequences with low quality bases.
I have done a little bit of troubleshooting and have confirmed that my installation of ITSxpress was successful and that the BASH syntax is working correctly to locate my files.
I am stumped as to why ITSxpress is unable to locate ITS start or stop sites since they are all amplified with ITS4-FUN and 5.8S primers.
I helped another user with this issue last week and it turned out that his read 2 qualities were very low, and that was what was driving it, It may be worth looking at it and if necessary only running your forward reads through.
Note that that error can mean that the start site, stop site, start and stop site are missing. for a particular merged read. It does not necessarily mean that both were missing. you can save the temp dir with the --keep-temp
flag then look at the intermediate file tblout.txt
to figure that out.
Hey all, I am having trouble getting ITSexpress to work on my files. The sequences were amplified using ITS4Fun and 5.8S primers to capture the ITS2 region.
The merging step seems to be working okay because I am generating data on the %merged reads and read lengths and what not, however my sequences seem to be erroring out and I am getting a message that says no ITS start or stop sites detected. This error line repeats for many many lines.
Any insight would be greatly appreciated!
Here is the code I am using:
`conda activate trim_3p
INDIR=/mnt/home/ernakovich/srs1085/DATA/Rhizo_pilot/ITS_dada2/02_filter/preprocessed_F INDIR2=/mnt/home/ernakovich/srs1085/DATA/Rhizo_pilot/ITS_dada2/02_filter/preprocessed_R OUTDIR=ITSxpress_f OUTDIR2=ITSexpress_r mkdir $OUTDIR mkdir $OUTDIR2
for i in $INDIR/R1 do( FILE=${i##/} BEFFILE=${FILE%R1} AFTFILE=${FILE##*R1} R1=$FILE R2=${BEFFILE}R2${AFTFILE} echo $R1 if [ -f $OUTDIR2/$R2 ] then continue fi
) done `