bcgsc / RNA-Bloom

:hibiscus: reference-free transcriptome assembly for short and long reads
Other
85 stars 7 forks source link

java.lang.RuntimeException: java.lang.StringIndexOutOfBoundsException: String index out of range: -1 #39

Closed xiekunwhy closed 2 years ago

xiekunwhy commented 2 years ago

Hi,

I got these errors and rna-bloom stuck forever, what's wrong? Exception in thread "Thread-16" Exception in thread "Thread-23" Exception in thread "Thread-21" Exception in thread "Thread-22" Exception in thread "Thread-20" Exception in thread "Thread-19" Exception in thread "Thread-17" Exception in thread "Thread-18" java.lang.RuntimeException: java.lang.StringIndexOutOfBoundsException: String index out of range: -1 at rnabloom.RNABloom$FastqToGraphWorker.run(RNABloom.java:617) at java.base/java.lang.Thread.run(Thread.java:834) Caused by: java.lang.StringIndexOutOfBoundsException: String index out of range: -1 at java.base/java.lang.StringLatin1.charAt(StringLatin1.java:47) at java.base/java.lang.String.charAt(String.java:693) at rnabloom.bloom.hash.NTHash.NTPC64(NTHash.java:471) at rnabloom.bloom.hash.NTHash.NTMC64(NTHash.java:702) at rnabloom.bloom.hash.CanonicalPairedNTHashIterator.next(CanonicalPairedNTHashIterator.java:41) at rnabloom.RNABloom$FastqToGraphWorker.run(RNABloom.java:574) ... 1 more java.lang.RuntimeException: java.lang.StringIndexOutOfBoundsException: String index out of range: -1 at rnabloom.RNABloom$FastqToGraphWorker.run(RNABloom.java:617) at java.base/java.lang.Thread.run(Thread.java:834) Caused by: java.lang.StringIndexOutOfBoundsException: String index out of range: -1 at java.base/java.lang.StringLatin1.charAt(StringLatin1.java:47) at java.base/java.lang.String.charAt(String.java:693) at rnabloom.bloom.hash.NTHash.NTPC64(NTHash.java:471) at rnabloom.bloom.hash.NTHash.NTMC64(NTHash.java:702) at rnabloom.bloom.hash.CanonicalPairedNTHashIterator.next(CanonicalPairedNTHashIterator.java:41) at rnabloom.RNABloom$FastqToGraphWorker.run(RNABloom.java:574) ... 1 more

Best, Kun

kmnip commented 2 years ago

Can you please report the version and the exact command you used?

xiekunwhy commented 2 years ago

Hi,

I use latest conda version, and jar v1.4.3 also has the same problem.

I assembled 30 samples separately, only 3 of them were finished normally, 27 samples got above errors.

here is the log file rna-bloom.zip

Best, Kun

kmnip commented 2 years ago

Thanks. I am posting part of your command here for future references:

rnabloom -sensitive true -bound 500000 -f true -left ZD_E28_3_1.fq.gz -right ZD_E28_3_2.fq.gz -revcomp-right -t 8 -outdir ZD_E28_3

There are several things wrong with your command:

  1. -sensitive does not take any arguments. -sensitive true would not work.
  2. -f does not take any arguments. -f true would not work. Typically, you don't need to use this option unless the command is a re-run and you want to overwrite all existing files.
  3. -bound 500000 is set way too large. I would not expect the fragment size for your paired-end reads to be 500Kbp. Usually, it is well under 1,000 for a typical Illumina RNA-seq sample. Setting a bound that high will result in an extremely long runtime for stage 2.
  4. Since you have installed from conda, please include the -ntcard option to automatically calculate the appropriate Bloom filter sizes. From one of the log files, I noticed that stage 1 was re-run automatically due to a very high Bloom filter false positive rate.

Taking all this together, this modified command should work:

rnabloom -ntcard -sensitive -bound 500 -f -left ZD_E28_3_1.fq.gz -right ZD_E28_3_2.fq.gz -revcomp-right -t 8 -outdir ZD_E28_3
xiekunwhy commented 2 years ago

Thank you for pointing out.

Sorry, the .sh files above is the second run, and the log files are the first run's log. here is the first run command line: rnabloom -left ZD_E28_3_1.fq.gz -right ZD_E28_3_2.fq.gz -revcomp-right -t 8 -outdir ZD_E28_3 all are default except input and output parameters.

I changed the parameters as I want to let rnabloom run through, but failed again.

Best, Kun

kmnip commented 2 years ago

What is the error this time? And, can you please remove the output directory ZD_E28_3 if it already exists and try again?

kmnip commented 2 years ago

Also, please use the ntcard option as mentioned previously.

xiekunwhy commented 2 years ago

Hi,

rnabloom finished normally this time (rnabloom -left ZD_E28_3_1.fq.gz -right ZD_E28_3_2.fq.gz -revcomp-right -t 20 -outdir ZD_E28_3 -ntcard), but produced too many transcripts (126327 transcript in rnabloom.transcripts.nr.fa file), the gene number of this species is about 20000, so 126327 transcript is too many. Is there a way or some parameters to reduce transcript number?

Best, Kun

kmnip commented 2 years ago

You may consider using EvidentialGene: http://arthropods.eugenes.org/EvidentialGene/about/EvidentialGene_trassembly_pipe.html

xiekunwhy commented 2 years ago

EvidentialGene is not so easy to use.