OverflowError: cannot serialize a string larger than 2 GiB

Hi, I am assembling a ~3.4 Gb genome, and am getting errors at the step 02.cns_align.

Here is the error from the main log:

[ERROR] 2020-06-04 07:20:00,950 get_cns failed: please check the following logs: [ERROR] 2020-06-04 07:20:00,951 /data/NextDenovo/01_rundir/02.cns_align/01.get_cns.sh.work/get_cns00/nextDenovo.sh.e [ERROR] 2020-06-04 07:20:00,951 /data/NextDenovo/01_rundir/02.cns_align/01.get_cns.sh.work/get_cns10/nextDenovo.sh.e

And in the file 01_rundir/02.cns_align/01.get_cns.sh.work/get_cns00/nextDenovo.sh.e :

hostname cd /data/NextDenovo/01_rundir/02.cns_align/01.get_cns.sh.work/get_cns00 time python /soft/ngs/NextDenovo/lib/nextcorrect.py -f /data/NextDenovo/01_rundir/02.cns_align//01.get_cns.input.idxs -i /data/NextDenovo/01_rundir/01.raw_align/03.sort_align.sh.work/sort_align00/input.seed.001.sorted.ovl -p 15 -b -max_lq_length 10000 -o cns.fasta; [INFO] 2020-06-03 16:05:51,889 Corrected step options: [INFO] 2020-06-03 16:05:51,889 Namespace(blacklist=False, dbuf=False, fast=False, idxs='/data/NextDenovo/01_rundir/02.cns_align//01.get_cns.input.idxs', max_cov_aln=130, max_lq_length=10000, min_cov_base=4, min_cov_seed=10, min_error_corrected_ratio=0.8, min_len_aln=500, min_len_seed=10000, out='cns.fasta', ovl='/data/NextDenovo/01_rundir/01.raw_align/03.sort_align.sh.work/sort_align00/input.seed.001.sorted.ovl', process=15, split=False) [INFO] 2020-06-03 16:05:52,134 Start a cns worker in 14791 from parent 14777 [INFO] 2020-06-03 16:05:52,134 Start a cns worker in 14793 from parent 14777 [INFO] 2020-06-03 16:05:52,135 Start a cns worker in 14795 from parent 14777 [INFO] 2020-06-03 16:05:52,135 Start a cns worker in 14797 from parent 14777 [INFO] 2020-06-03 16:05:52,136 Start a cns worker in 14799 from parent 14777 [INFO] 2020-06-03 16:05:52,137 Start a cns worker in 14801 from parent 14777 [INFO] 2020-06-03 16:05:52,138 Start a cns worker in 14803 from parent 14777 [INFO] 2020-06-03 16:05:52,138 Start a cns worker in 14805 from parent 14777 [INFO] 2020-06-03 16:05:52,139 Start a cns worker in 14807 from parent 14777 [INFO] 2020-06-03 16:05:52,140 Start a cns worker in 14809 from parent 14777 [INFO] 2020-06-03 16:05:52,140 Start a cns worker in 14811 from parent 14777 [INFO] 2020-06-03 16:05:52,141 Start a cns worker in 14813 from parent 14777 [INFO] 2020-06-03 16:05:52,142 Start a cns worker in 14815 from parent 14777 [INFO] 2020-06-03 16:05:52,143 Start a cns worker in 14817 from parent 14777 [INFO] 2020-06-03 16:05:52,143 Start a cns worker in 14820 from parent 14777 Traceback (most recent call last): File "/soft/ngs/NextDenovo/lib/nextcorrect.py", line 304, in main(args) File "/soft/ngs/NextDenovo/lib/nextcorrect.py", line 226, in main worker, read_seq_data(args, corrected_seeds), chunksize=1): File "/usr/local/python/miniconda/envs/nextdenovo/lib/python2.7/multiprocessing/pool.py", line 673, in next raise value OverflowError: cannot serialize a string larger than 2 GiB Command exited with non-zero status 1 42484.83user 4473.74system 2:18:10elapsed 566%CPU (0avgtext+0avgdata 42352176maxresident)k 6227512inputs+864584outputs (43major+1645951614minor)pagefaults 0swaps

It seems to be caused by the limitation of multiprocessing/pickle libs under Python 2.7. Is there any workaround ?

Thanks

Nextomics / NextDenovo

OverflowError: cannot serialize a string larger than 2 GiB #69