Open lrippel opened 6 years ago
Can you upload a link with a minimal example file that I can use to reproduce the error you get?
Did you solve the issue? I got the same error.
I also have same problem. I was trying to use online version. I DO NOT get any result too.
Job ID: 2d4ec0a8-a06c-eafd-6eef-20996247739a Successfully completed.
The results are available at the following link:
vsearch v2.4.3_osx_x86_64, 64.0GB RAM, 8 cores https://github.com/torognes/vsearch
Reading file ../tmp/wt2/wt2.fasta 100%
142256330 nt in 6227113 seqs, min 16, max 28, avg 23
Masking 100%
Sorting by length 100%
Counting unique k-mers 100%
Clustering 100%
Sorting clusters 100%
Writing clusters 100%
Clusters: 537350 Size min 1, max 352382, avg 11.6
Singletons: 375756, 6.0% of seqs, 69.9% of clusters
Traceback (most recent call last):
File "run_vsearch_clust_fast.py", line 217, in
@forrestzhang
There is a known bug about processing FASTA files with tally
when they have already been cleaned from 3p adapters. I will try to fix this in the pipeline as soon as I can.
Meanwhile, you can just tally your input file before uploading it to mirnovo:
Download tally
:
Run on a bash terminal:
file=wt2.fa.gz;
tally -i $file --fasta-in -o $file.tallied.gz -tri 20 -l 16 -u 28 --fasta-out -format '>trn_t%T_i%I_x%C%n%R%n'
Upload the wt2.fa.gz.tallied.gz
to mirnovo.
Additionally, since you want to analyse samples from plant species I would advise you to try three model options:
UNIVERSAL animals may capture more canonical miRNAs than the plant-specific models. You can then get a union or consensus of your predictions from these models.
@forrestzhang There is a known bug about processing FASTA files with
tally
when they have already been cleaned from 3p adapters. I will try to fix this in the pipeline as soon as I can.Meanwhile, you can just tally your input file before uploading it to mirnovo:
Download
tally
:
- Install from source: http://wwwdev.ebi.ac.uk/enright-dev/kraken/reaper/src/reaper-latest.tgz or
- use precompiled Linux binary: http://wwwdev.ebi.ac.uk/enright-dev/kraken/reaper/binaries/reaper-13-100/linux/
- Run on a bash terminal:
file=wt2.fa.gz; tally -i $file --fasta-in -o $file.tallied.gz -tri 20 -l 16 -u 28 --fasta-out -format '>trn_t%T_i%I_x%C%n%R%n'
- Upload the
wt2.fa.gz.tallied.gz
to mirnovo.Additionally, since you want to analyse samples from plant species I would advise you to try three model options:
- species-specific (osa)
- UNIVERSAL plants and
- UNIVERSAL animals
UNIVERSAL animals may capture more canonical miRNAs than the plant-specific models. You can then get a union or consensus of your predictions from these models.
Thanks for your advise. But, tally have some problem with sequence name.
~/Software/mirnovo_pkg_linux_v1.0/bin/reaper-16-098/src/tally -i WT2_clean_total.fa.gz -o WT2_clean_total.fa.tallied.gz -tri 20 -l 16 -u 28 --fasta-out -format '>trn_t%T_i%I_x%C%n%R%n'
[tally] parse error at line 1 (remaining format string [@%I%#%R%n+%#%Q%n], buffer [>HISEQ:916:CCP03ANXX:3:1101:1384:19911:N:0:AACAACCA]) [tally] data log size 28 (0.25G) hash log size 25 (0.50G) unit size 16 [tally] parse error at line 1 (remaining format string [@%I%#%R%n+%#%Q%n], buffer [>HISEQ:916:CCP03ANXX:3:1101:1384:19911:N:0:AACAACCA]) discarded_unmatched=0 discarded_alien=0 discarded_length=0 discarded_trint=0 nt_in=0 nt_out=0 passed_unique=0 passed_total=0 num_records=0 [memusage] 805306368 bytes
--------Fasta file------ >HISEQ:916:CCP03ANXX:3:1101:18421:21101:N:0:AACAACCA ACGAACGAGACCTCAGC >HISEQ:916:CCP03ANXX:3:1101:18323:21851:N:0:AACAACCA ATCACGAGAGGAACCG >HISEQ:916:CCP03ANXX:3:1101:18260:22241:N:0:AACAACCA GTGGAGCGATTTGTCTGGTTAATTCCGTTAAC >HISEQ:916:CCP03ANXX:3:1101:18355:22281:N:0:AACAACCA CCCAAGATGAGTGCTCTCTC >HISEQ:916:CCP03ANXX:3:1101:18302:22371:N:0:AACAACCA CAGCCGACTCAGAACTGGTA >HISEQ:916:CCP03ANXX:3:1101:18724:20131:N:0:AACAACCA NCGAACAGCCGACTCAGAACTG >HISEQ:916:CCP03ANXX:3:1101:18738:20941:N:0:AACAACCA AATAACAGGTCTGTGATGCCCTTAGATGTTCTGGGCC >HISEQ:916:CCP03ANXX:3:1101:18528:21621:N:0:AACAACCA GGAATTTCCGGTGGAGCGGTGAAATGCATTG
You omitted the --fasta-in
argument (see my command in previous reply too). This option informs tally
to parse a FASTA input file (instead of a FASTQ, which is the default)
@forrestzhang
I have now integrated a fix in mirnovo web-server for the issue described above: https://github.com/dvitsios/mirnovo/issues/3#issuecomment-449373618
Basically, mirnovo is now able to also process FASTA files which are already cleaned from their 3p-adapters. So far, the pipeline was primarily focused around FASTQ files (either raw or cleaned) or FASTA with their 3p adapters included.
Hi All,
I'm running mirnovo stand alone:
perl mirnovo.pl -i /raid/projects/scratch/BioTrans/Pdensiflora/miRNA/reformat_to_phred64/QA/fa/Bud_Tree_C64.clipped.fa.gz -g NA --disable-genome -t universal_plants -o BudCfa
And I'm getting this error:
Reading file ../tmp/BudCfa-EKwNInLH/Bud_Tree_C64.clipped.fa 100%
find_Nread_clusters(dir, min_read_N, min_num_of_variants)
File "run_vsearch_clust_fast.py", line 98, in find_Nread_clusters
depth = vals[1]
IndexError: list index out of range
Traceback (most recent call last):
File "mirnovo_analysis.py", line 211, in
cluster_reads(input_reads_fasta, usearch_perc_id, min_numb_reads, usearch_dir, min_num_of_variants, job_id)
File "mirnovo_analysis.py", line 61, in cluster_reads
subprocess.check_call(cmd, shell=True)
File "/usr/lib64/python2.7/subprocess.py", line 542, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'python -u run_vsearch_clust_fast.py ../tmp/BudCfa-EKwNInLH/Bud_Tree_C64.clipped.fa.gz 0.9 5 ../tmp/BudCfa-EKwNInLH/1/usearch_out 1 1 16 28' returned non-zero exit status 1
15577335 nt in 729882 seqs, min 16, max 28, avg 21 Masking 100%
Sorting by length 100% Counting unique k-mers 100%
Clustering 100%
Sorting clusters 100% Writing clusters 100%
Clusters: 90723 Size min 1, max 13361, avg 8.0 Singletons: 38396, 5.3% of seqs, 42.3% of clusters Traceback (most recent call last): File "run_vsearch_clust_fast.py", line 217, in
All job progress has been saved to: ../tmp/BudCfa-EKwNInLH/bsub.log file.
Results can be found at: ../tmp/BudCfa-EKwNInLH/All-Results/
Can someone explain why and how to cope?
Thanks!