Kinggerm / GetOrganelle

Organelle Genome Assembly Toolkit (Chloroplast/Mitocondrial/ITS)
GNU General Public License v3.0
267 stars 51 forks source link

Memory Error #107

Closed sabrtoothcat closed 2 years ago

sabrtoothcat commented 2 years ago

Hi, I have the same problem as a previously mentioned issue. It presents a memory error after ROUND 3 (see below). I have increased my memory to 128gb but it is still giving me the same error. I have noticed the memory increasing after each round. What is the required memory to run GetOrganelle. Please help:)

get_organelle_from_reads.py -1forward_paired.fq.gz -2 reverse_paired.fq.gz -o mitochondria_output6 -F embplant_mt -R 50 -k 21,45,65,85,105 --verbose --keep-temp

../

2021-10-21 13:36:09,253 - INFO: Checking seed reads and parameters ... 2021-10-21 13:36:09,253 - INFO: The automatically-estimated parameter(s) do not ensure the best choice(s). 2021-10-21 13:36:09,253 - INFO: If the result graph is not a circular organelle genome, 2021-10-21 13:36:09,253 - INFO: you could adjust the value(s) of '-w'/'-R' for another new run. 2021-10-21 13:36:24,922 - INFO: Pre-assembling mapped reads ... 2021-10-21 13:36:24,960 - INFO: spades.py -t 24 -s mitochondria_output6/seed/embplant_mt.initial.fq -k 45 --only-assembler -o /mnt/lustre/users/clee/outputs/mitochondria_output6/seed/embplant_mt.initial.fq.spad$2021-10-21 13:36:55,235 - INFO: /apps/chpc/bio/python/3.7.4_gcc610/lib/python3.7/site-packages/GetOrganelle-1.7.5-py3.7.egg/EGG-INFO/scripts/slim_graph.py --verbose --log -t 24 --wrapper /mnt/lustre/users/clee$2021-10-21 13:41:30,369 - INFO: Pre-assembling mapped reads finished. 2021-10-21 13:41:30,370 - INFO: Estimated embplant_mt-hitting base-coverage = 193.33 2021-10-21 13:41:30,813 - INFO: Estimated word size(s): 77 2021-10-21 13:41:30,813 - INFO: Setting '-w 77' 2021-10-21 13:41:30,814 - INFO: Setting '--max-extending-len inf' 2021-10-21 13:41:31,618 - INFO: Checking seed reads and parameters finished.

2021-10-21 13:41:31,618 - INFO: Making read index ... 2021-10-21 14:04:13,054 - INFO: Mem 21.338 G, 138858829 candidates in all 150000000 reads 2021-10-21 14:04:13,422 - INFO: Pre-grouping reads ... 2021-10-21 14:04:13,423 - INFO: Setting '--pre-w 77' 2021-10-21 14:04:26,812 - INFO: Mem 19.97 G, 200000/1625482 used/duplicated 2021-10-21 14:05:17,510 - INFO: Mem 20.7 G, 6422 groups made. 2021-10-21 14:05:54,666 - INFO: Making read index finished.

2021-10-21 14:05:54,668 - INFO: Extending ... 2021-10-21 14:05:54,668 - INFO: Adding initial words ... 2021-10-21 14:06:08,000 - INFO: AW 12735358 2021-10-21 14:22:00,384 - INFO: Round 1: 138858829/138858829 AI 4184650 AW 78464614 Mem 12.582 2021-10-21 14:42:17,492 - INFO: Round 2: 138858829/138858829 AI 15724477 AW 307824044 Mem 47.905 2021-10-21 15:04:42,641 - INFO: Round 3: 138858829/138858829 AI 27006080 AW 555018070 Mem 87.664 2021-10-21 15:16:40,002 - ERROR: Traceback (most recent call last):

File "/apps/chpc/bio/python/3.7.4_gcc610/lib/python3.7/site-packages/GetOrganelle-1.7.5-py3.7.egg/EGG-INFO/scripts/get_organelle_from_reads.py", line 4016, in main echo_step=echo_step, log_handler=log_handler) File "/apps/chpc/bio/python/3.7.4_gcc610/lib/python3.7/site-packages/GetOrganelle-1.7.5-py3.7.egg/EGG-INFO/scripts/get_organelle_from_reads.py", line 2474, in extending_no_lim accepted_words.add(this_c_seq[temp_length - i:seq_len - i]) MemoryError

Kinggerm commented 2 years ago

It's hard to guess the memory usage for any unknown samples. For embplant_pt, it generally fell between 4-16G. But we didn't have many embplant_mt trials because embplant_mt is generally repeat rich and unachievable for Illumina data only, so it is harder to say many memory usage it consumes.

You can increase -w together with --out-per-round to save memory.

Kinggerm commented 2 years ago

Besides, reply to and close the previous issue if your problem was solved.

sabrtoothcat commented 2 years ago

I will give it a try, thank you.

sabrtoothcat commented 2 years ago

What do you mean its unachievable for Illumina data only, I also have data from Minion. Is it possible to combine both data sets on GetOrganelle and run it like that?

Kinggerm commented 2 years ago

Once you have a complete target assembly graph (achieved from GetOrganelle), you can map the long read sequencing data to the assembly graph then complete the structure resolution using Traversome, an under developing tool.

sabrtoothcat commented 2 years ago

Cool, I will have a look into it. I was planning to verify the assembled genome with long sequence reads using minimap2. Does Traversome do something similar?

Also, GetOrganelle ran successfully to the end. Thanks again for your help!

Kinggerm commented 2 years ago

Traversome is a structure analyzer or proportion estimator rather than a mapper.

Good to hear it works.