AntonBankevich / LJA

Other
110 stars 16 forks source link

Started gap closing procedure Child process crashed #38

Open dbajpp0 opened 1 year ago

dbajpp0 commented 1 year ago

I ran the fasta reads of two HiFi cells (24 and 22 Gb) in LJA in a single AMD EPYC 7351P 16-Core Processor but I got a crash at gap closing step (about 30 hours of running). RAM Memory is about 128 Gb and the maximum used was about 100Gb, while crashed it was using about 55 Gb. The DNA to make both libraries is from a pool of 20 tiny individuals (> 1 mm each). hifiasm finished the analyses. I build the binary 5 days ago from git in an ubuntu Ubuntu 20.04.5, kernel 5.4.0, gcc 9.4.0 and cmake 3.16.3. Any tip to avoid crash and restart a new run from crashing point? Thanks in advance, Joan

log output lja -o tethys_lajolla_2runs --reads m64086e_230106_194450.hifi_reads.fastq.gz --reads m64086e_230111_202349.hifi_reads.fastq.gz -t 28 --diploid 00:00:00 4Mb INFO: Hello! You are running La Jolla Assembler (LJA), a tool for genome assembly from PacBio HiFi reads 00:00:00 5Mb INFO: LJA pipeline started 00:00:00 5Mb INFO: Performing initial correction with k = 501 00:00:00 0Mb INFO: Reading reads 00:00:00 0Mb INFO: Extracting minimizers 00:14:18 28.8Gb INFO: Finished read processing 00:14:18 28.8Gb INFO: 36073726 hashs collected. Starting sorting. 00:14:20 29.9Gb INFO: Finished sorting. Total distinct minimizers: 9104109 00:14:20 29.9Gb INFO: Starting construction of sparse de Bruijn graph 00:14:32 32.1Gb INFO: Vertex map constructed. 00:14:32 32.1Gb INFO: Filling edge sequences. 00:40:24 47.4Gb INFO: Finished sparse de Bruijn graph construction. 00:40:26 47.4Gb INFO: Collecting tips 00:40:31 49.7Gb INFO: Added 2420389 artificial minimizers from tips. 00:40:31 49.7Gb INFO: Collected 30484083 old edges. 00:40:34 49.7Gb INFO: New minimizers added to sparse graph. 00:40:34 49.7Gb INFO: Refilling graph with old edges. 00:55:41 49.7Gb INFO: Filling graph with new edges. 00:57:01 49.7Gb INFO: Finished fixing sparse de Bruijn graph. 00:57:56 50.9Gb INFO: Starting to extract disjointigs. 00:58:41 50.9Gb INFO: Finished extracting 13298115 disjointigs of total size 16243978545 01:03:36 0Mb INFO: Loading disjointigs from file "tethys_lajolla_2runs/k501/disjointigs.fasta" 01:08:34 42.8Gb INFO: Filling bloom filter with k+1-mers. 01:41:44 42.9Gb INFO: Filled 40377766360 bits out of 306611933760 01:41:44 42.9Gb INFO: Finished filling bloom filter. Selecting junctions. 02:08:39 46.7Gb INFO: Collected 24431404 junctions. 02:10:36 46.7Gb INFO: Starting DBG construction. 02:11:00 46.7Gb INFO: Vertices created. 02:18:25 46.7Gb INFO: Filled dbg edges. Adding hanging vertices 02:18:33 46.7Gb INFO: Added 274 hanging vertices 02:18:33 46.7Gb INFO: Merging unbranching paths 02:18:59 46.7Gb INFO: Ended merging edges. Resulting size 22382185 02:22:14 46.7Gb INFO: Cleaning edge coverages 02:22:26 46.7Gb INFO: Collecting alignments of sequences to the graph 02:36:24 63.1Gb INFO: Alignment collection finished. Total length of alignments is 578668637 02:37:15 63.1Gb INFO: Precorrecting reads 02:38:00 63.1Gb INFO: Applying corrections to reads 02:38:21 63.1Gb INFO: Applied correction to 1404153 reads 02:38:21 63.1Gb INFO: Corrected simple errors in 1404153 reads 02:38:21 63.1Gb INFO: Applying changes to the graph 03:13:12 89.5Gb INFO: Collecting and storing read suffixes 03:43:45 89.5Gb INFO: Correcting dinucleotide errors in reads 03:53:05 89.5Gb INFO: Applying corrections to reads 03:58:42 89.5Gb INFO: Applied correction to 1112277 reads 03:58:42 89.5Gb INFO: Corrected 1112277 dinucleotide sequences 03:58:42 89.5Gb INFO: Marking reliable edges 03:59:21 89.5Gb INFO: Marked 3271893 edges in 915644 paths as reliable 03:59:23 89.5Gb INFO: Correcting low covered regions in reads with K = 800 04:31:25 89.5Gb INFO: Applying corrections to reads 04:55:52 89.5Gb INFO: Applied correction to 2332968 reads 04:55:52 89.5Gb INFO: Corrected low covered regions in 2332968 reads with K = 800 04:55:52 89.5Gb INFO: Applying changes to the graph 07:57:06 94.7Gb INFO: Marking reliable edges 07:57:26 94.7Gb INFO: Marked 996751 edges in 462739 paths as reliable 07:57:26 94.7Gb INFO: Correcting low covered regions in reads with K = 2000 09:28:24 94.7Gb INFO: Applying corrections to reads 09:41:48 94.7Gb INFO: Applied correction to 358442 reads 09:41:48 94.7Gb INFO: Corrected low covered regions in 358442 reads with K = 2000 09:41:48 94.7Gb INFO: Applying changes to the graph 13:19:40 96.7Gb INFO: Correcting dinucleotide errors in reads 13:38:37 96.7Gb INFO: Applying corrections to reads 13:40:21 96.7Gb INFO: Applied correction to 33011 reads 13:40:21 96.7Gb INFO: Corrected 33011 dinucleotide sequences 13:40:21 96.7Gb INFO: Marking reliable edges 13:40:41 96.7Gb INFO: Marked 864704 edges in 421537 paths as reliable 13:40:42 96.7Gb INFO: Correcting low covered regions in reads 18:54:15 101.8Gb INFO: Applying corrections to reads 20:54:46 101.8Gb INFO: Applied correction to 1103409 reads 20:54:48 101.8Gb INFO: Corrected low covered regions in 1516510 reads 20:54:49 101.8Gb INFO: Marking reliable edges 20:55:22 101.8Gb INFO: Marked 715265 edges in 339678 paths as reliable 20:55:22 101.8Gb INFO: Correcting low covered regions in reads with K = 3500 22:33:29 101.8Gb INFO: Applying corrections to reads 22:41:14 101.8Gb INFO: Applied correction to 95477 reads 22:41:14 101.8Gb INFO: Corrected low covered regions in 95477 reads with K = 3500 22:41:14 101.8Gb INFO: Applying changes to the graph 26:02:43 101.8Gb INFO: Printing reads to fasta file "tethys_lajolla_2runs/k501/corrected.fasta" 26:22:36 5Mb INFO: Initial correction results with k = 501 printed to "tethys_lajolla_2runs/k501/corrected.fasta" 26:22:36 5Mb INFO: Performing second phase of error correction using k = 5001 26:22:36 0Mb INFO: Reading reads 26:22:36 0Mb INFO: Extracting minimizers 26:28:02 20.3Gb INFO: Finished read processing 26:28:02 20.3Gb INFO: 71226089 hashs collected. Starting sorting. 26:28:05 22.3Gb INFO: Finished sorting. Total distinct minimizers: 27526106 26:28:05 22.3Gb INFO: Starting construction of sparse de Bruijn graph 26:28:51 32.7Gb INFO: Vertex map constructed. 26:28:51 32.7Gb INFO: Filling edge sequences. 26:42:05 78.9Gb INFO: Finished sparse de Bruijn graph construction. 26:42:14 78.9Gb INFO: Collecting tips 26:42:27 82Gb INFO: Added 1815686 artificial minimizers from tips. 26:42:27 82Gb INFO: Collected 54232201 old edges. 26:42:31 82.4Gb INFO: New minimizers added to sparse graph. 26:42:31 82.4Gb INFO: Refilling graph with old edges. 26:55:32 82.4Gb INFO: Filling graph with new edges. 26:55:59 83.2Gb INFO: Finished fixing sparse de Bruijn graph. 26:58:44 86.3Gb INFO: Starting to extract disjointigs. 26:59:41 86.3Gb INFO: Finished extracting 2351791 disjointigs of total size 18852733867 27:04:43 0Mb INFO: Loading disjointigs from file "tethys_lajolla_2runs/k5001/disjointigs.fasta" 27:10:10 32.5Gb INFO: Filling bloom filter with k+1-mers. 27:34:14 32.5Gb INFO: Filled 32816977551 bits out of 226925666432 27:34:14 32.5Gb INFO: Finished filling bloom filter. Selecting junctions. 27:54:01 33Gb INFO: Collected 5459018 junctions. 27:54:30 33Gb INFO: Starting DBG construction. 27:54:35 33Gb INFO: Vertices created. 27:59:21 33Gb INFO: Filled dbg edges. Adding hanging vertices 27:59:23 33Gb INFO: Added 450 hanging vertices 27:59:23 33Gb INFO: Merging unbranching paths 27:59:37 33Gb INFO: Ended merging edges. Resulting size 2770710 28:01:49 33Gb INFO: Cleaning edge coverages 28:01:50 33Gb INFO: Collecting alignments of sequences to the graph 28:01:50 33Gb INFO: Storing suffixes of read paths of length up to 10000000 28:06:52 44.3Gb INFO: Alignment collection finished. Total length of alignments is 10881215 28:06:53 44.3Gb INFO: Correcting dinucleotide errors in reads 28:06:55 44.3Gb INFO: Applying corrections to reads 28:06:55 44.3Gb INFO: Applied correction to 134 reads 28:06:55 44.3Gb INFO: Corrected 134 dinucleotide sequences 28:06:55 44.3Gb INFO: Marking reliable edges 28:06:59 44.3Gb INFO: Marked 410100 edges in 114968 paths as reliable 28:06:59 44.3Gb INFO: Correcting low covered regions in reads 28:34:32 48.5Gb INFO: Applying corrections to reads 28:34:34 48.5Gb INFO: Applied correction to 507122 reads 28:34:34 48.5Gb INFO: Corrected low covered regions in 507608 reads 28:34:34 48.5Gb INFO: Collapsing bulges 28:34:35 48.5Gb INFO: Applying corrections to reads 28:34:35 48.5Gb INFO: Applied correction to 8052 reads 28:34:35 48.5Gb INFO: Collapsed bulges in 17964 reads 28:34:35 48.5Gb INFO: Applying changes to the graph 28:55:11 55.7Gb INFO: Running second round of error correction 28:55:11 55.7Gb INFO: Correcting dinucleotide errors in reads 28:55:12 55.7Gb INFO: Applying corrections to reads 28:55:12 55.7Gb INFO: Applied correction to 19 reads 28:55:12 55.7Gb INFO: Corrected 19 dinucleotide sequences 28:55:12 55.7Gb INFO: Correcting dinucleotide errors in reads 28:55:14 55.7Gb INFO: Applying corrections to reads 28:55:14 55.7Gb INFO: Applied correction to 2 reads 28:55:14 55.7Gb INFO: Corrected 2 dinucleotide sequences 28:55:14 55.7Gb INFO: Marking reliable edges 28:55:16 55.7Gb INFO: Marked 91028 edges in 43137 paths as reliable 28:55:16 55.7Gb INFO: Correcting low covered regions in reads 29:02:42 55.7Gb INFO: Applying corrections to reads 29:02:43 55.7Gb INFO: Applied correction to 73052 reads 29:02:43 55.7Gb INFO: Corrected low covered regions in 73372 reads 29:02:43 55.7Gb INFO: Correcting dinucleotide errors in reads 29:02:45 55.7Gb INFO: Applying corrections to reads 29:02:45 55.7Gb INFO: Applied correction to 2 reads 29:02:45 55.7Gb INFO: Corrected 2 dinucleotide sequences 29:02:45 55.7Gb INFO: Remarking reliable edges 29:02:52 55.7Gb INFO: Correcting tips using reliable edge marks 29:11:15 55.7Gb INFO: Applying corrections to reads 29:11:15 55.7Gb INFO: Applied correction to 316740 reads 29:11:15 55.7Gb INFO: Collapsing bulges 29:11:16 55.7Gb INFO: Applying corrections to reads 29:11:16 55.7Gb INFO: Applied correction to 549 reads 29:11:16 55.7Gb INFO: Collapsed bulges in 16192 reads 29:11:16 55.7Gb INFO: Applying changes to the graph 29:28:21 55.7Gb INFO: Started gap closing procedure Child process crashed

dbajpp0 commented 1 year ago

The last three lines of the log file are: 29:28:21 55.7Gb INFO: Started gap closing procedure 29:28:22 55.7Gb TRACE: Collecting k-mers from tips 29:28:44 96.7Gb TRACE: Sorting k-mers from tips

haiyun-fan commented 1 year ago

I have encountered the same problem. Have you resolved it ?

cyycyj commented 1 year ago

Dear developer, I have also encontered this isuue:

24:31:58 73.3Gb INFO: Finished alignment. 24:31:58 73.3Gb INFO: Printing alignments to "/data/02_way03_lja/01_assembly/output/uncompressing/alignments.txt" 24:32:47 73.8Gb INFO: Reading and processing initial reads from ["P3.hifireads.fasta"] 26:19:39 75.2Gb INFO: Uncompressing homopolymers in contigs Child process crashed