jlim125 commented 3 years ago

Hello,

I was running RNA-Bloom to assemble nanopore PCR cDNA sequencing data. I used a default option as suggested, but the Stage 2: Correct long reads for "rnabloom", Parsing 'xxx.fasta'... never ends (> 7 days). Could you give me any advice on this? I have attached the log below. Thanks.

''' RNA-Bloom v1.3.1 args: [-long, /xxx.fasta, -ntcard, -t, 8, -outdir, nanopore_4]

name: rnabloom outdir: nanopore_4 WARNING: Output directory does not exist! Created output directory at nanopore_4

K-mer counting with ntCard... Running command: ntcard -t 8 -k 17 -c 65535 -p nanopore_4/rnabloom @nanopore_4/rnabloom.ntcard.readslist.txt... Parsing histogram file nanopore_4/rnabloom_k17.hist... Unique k-mers (k=17): 1,572,283,379 Min k-mer coverage threshold: 2 K-mer counting completed in 2m 23s

Bloom filters Memory (GB)

de Bruijn graph: 3.4745061 k-mer counting: 8.826317

Total: 12.300823

Stage 1: Construct graph from reads (k=17) [1] Parsing /xxx.fasta... [1] Parsed 12,632,257 sequences. Parsed 12,632,257 reads in total. DBG Bloom filter FPR: 0.9975167 % Counting Bloom filter FPR: 1.0151776 % Stage 1 completed in 23m 46s

Stage 2: Correct long reads for "rnabloom" Parsing /xxx.fasta... '''

kmnip commented 3 years ago

Hi @jlim125

Is the command still running? Are you able to check the CPU usage from top?

Do you see any output fasta files? and are they empty?

Can you try again with 2 million reads?

Ka Ming

jlim125 commented 3 years ago

Hi Ka Ming,

Thanks for your response. It was already killed, but I checked the CPU usage and it was very low (~1 %). I found empty fasta files. (rnabloom.longreads.corrected.long.fa, rnabloom.longreads.corrected.repeats.fa and rnabloom.longreads.corrected.short.fa)

Do you recommend randomly select 2M reads and try it again?

Best, Jaechul

kmnip commented 3 years ago

If those fasta files are empty, then nothing was actually done during the error correction step. It basically got stuck as soon as that stage started.

Yes, please try again with less reads and see if you are able to run until the end without errors.

jlim125 commented 3 years ago

Ok, thanks. I tried it again with 2M reads (randomly selected), but it's the same. I see empty fasta files..

kmnip commented 3 years ago

Technically, those files are empty initially. Did you see it running in top this time?

What is your Java version? What was the command you used? Did you set the max Java heap size?

jlim125 commented 3 years ago

Yes, the CPU usage is ~2% in top and it's still running >12 hours.

And to answer your questions:

Java -version openjdk version "1.8.0_242" OpenJDK Runtime Environment (build 1.8.0_242-b08) OpenJDK 64-Bit Server VM (build 25.242-b08, mixed mode)

java -jar ~/bin/RNA-Bloom_v1.3.1/RNA-Bloom.jar -long /xxxt_2M.fasta -ntcard -t 4 -outdir nanopore_2M_re

I didn't set the max Java heap size.

Thank you.

kmnip commented 3 years ago

In that case, can you please try assembling our small simulated dataset? www.bcgsc.ca/downloads/supplementary/rnabloom/nanosim_mouse_rna.fa.gz

It has only ~400K reads.

jlim125 commented 3 years ago

I downloaded and used nanosim_mouse_rna.fa.gz, but it's still running >18 hours with empty fasta files. Stuck at the same step (Stage 2).

kmnip commented 3 years ago

Can you set Java's max heap size to 2g for this dataset?

java -Xmx2g -jar RNA-Bloom.jar ...

jlim125 commented 3 years ago

Hi, I tried that, but still there is no update for >22 hours. I have empty fasta files and it's stuck at Stage 2..

kmnip commented 3 years ago

This is such a head-scratcher..

Can you send me a email to kmnip@bcgsc.ca ? I will send you a pre-release version for you to test.

Thanks

ChristophBoerlin commented 2 years ago

Hi, are there any updates on this? I'm having the same issue.

kmnip commented 2 years ago

A new version that includes the fix will be released later this month. I will let you know about it then.

kmnip commented 2 years ago

A new version is released. Sorry for the delay. Please open a new issue if you still have issues. Thank you!

bcgsc / RNA-Bloom

Running time #15

Bloom filters Memory (GB)

de Bruijn graph: 3.4745061 k-mer counting: 8.826317