Open MDSharma opened 5 years ago
Looks weird. How large is the read file (e.g. number of reads)? The "-u" looks a bit high, it's mentioned "Set the number to 8 if more than 100G is available"..
Hello,
There are a total of 1,460,925,364 reads (R1 plus R2)..
I did see the comment about -u and increased it to use more threads as I had 2TB ram available and with 8 threads it was taking forever :-)
The connecting stage is a single thread operation, I have multi-thread idea in the to-do list, but it is not coming soon. The good news is that there is an algorithm inside the connecting stage in which 50% of kmers are enough for it to complete (the actual percentage is data dependent).
Large data usually makes it slow, because more time will be spent on accessing memory, cache miss...
Let me provide some testing parameters,
for our server, it takes 1531 sec to load 780273398 kmers, and the connecting stage takes 16513 sec.
Thanks for the estimates. Could you share which CPU / platform your servers are on? Based on your numbers, loading would have taken ~13798 seconds (using the approx figures from you) and only took 10115 seconds (a bit faster on this dataset).
However, for connection stage, extrapolating to my kmer numbers ... it should take roughly 8 hours for 1,460,925,364 reads / 41 hours for 7,032,587,296 kmers to be processed. Running on Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz it still seems to be going on past 71 hours now and is only at 9%.
Just trying to think about the practicality of using Shannon with this particular assembly as the servers have a max wall-time.
Any thoughts much appreciated (and I do understand that complexities around the data metrics / specifics may mean that it is what it is.. ).
our processor type is AMD Opteron(tm) Processor 6380; it is a slow one, but I think L1, L2 caches matter more when it needs to build and access elements in a dictionary of size 7,032,587,296 (potentially more due to load factor where each elements takes 60 bytes). Can you look at the log file inside your specified output directory, called "log_shannonC"? It logs all the progress bar. you can post here, but I suspect it is very large.
Some thought: If intervals between printed progress bar have big delay somewhere, it might be the case that hash dictionary is full and it is under remapping; in this case it is likely that there are many collisions in the hash dictionary, which slows down everything. Increase initial hash dictionary size might help.
hash map size initialization should be handled automatically, the default load factor is 0.8. Decreasing it will reserve more memory. To do it,
In the commend line, type "./Shannon_RNASeq_Cpp shannon --help | less" you can see an option "--load_factor arg (=0.80000)" the actual reserved dictionary size = (number distinct kmer in the file your_dir/jf_stat reported by jellyfish stat) / (your load factor).
My experience is that keeping increasing dictionary size does not always have better performance, the operating system would spend more time on managing.
our processor type is AMD Opteron(tm) Processor 6380; it is a slow one, but I think L1, L2 caches matter more when it needs to build and access elements in a dictionary of size 7,032,587,296 (potentially more due to load factor where each elements takes 60 bytes). Can you look at the log file inside your specified output directory, called "log_shannonC"? It logs all the progress bar. you can post here, but I suspect it is very large.
Some thought: If intervals between printed progress bar have big delay somewhere, it might be the case that hash dictionary is full and it is under remapping; in this case it is likely that there are many collisions in the hash dictionary, which slows down everything. Increase initial hash dictionary size might help.
SHC Log File: [Sun Mar 31 03:23:41 2019
]
[print_and_log_kmer_strand_setting][INFO] kmer_length is : 25
[print_and_log_kmer_strand_setting][INFO] double stranded : Yes
[print_and_log_read_length_setting][INFO] has_single : No
[print_and_log_read_length_setting][INFO] has_pair : Yes
[print_and_log_read_length_setting][INFO] pair 1 length : 150
[print_and_log_read_length_setting][INFO] pair 2 length : 150
[print_and_log_input_path_setting][INFO] read pair1 input from : /file_R1_clean_paired.fa
[print_and_log_input_path_setting][INFO] read pair2 input from : /file_R2_clean_paired.fa
[print_and_log_output_setting][INFO] output save to : /shannon/assembly/kmer
[print_and_log_output_setting][INFO] output_seq_min_len : 200
[print_and_log_partition_setting][INFO] Duplicate correction ***********
[print_and_log_partition_setting][INFO] load_factor : 0.800000
[print_and_log_partition_setting][INFO] rmer_length : 15
[print_and_log_partition_setting][INFO] min_count : 3
[print_and_log_partition_setting][INFO] min_len : 75
[print_and_log_partition_setting][INFO] threshold : 0.500000
[print_and_log_partition_setting][INFO] is_use_set : No
[print_and_log_partition_setting][INFO] num_sort_thread : 72
[print_and_log_partition_setting][INFO]
[print_and_log_partition_setting][INFO] Metis setup ***********
[print_and_log_partition_setting][INFO] use_multiple_partition : Yes
[print_and_log_partition_setting][INFO] partition_size : 500
[print_and_log_partition_setting][INFO] non_partition_size : 500
[print_and_log_partition_setting][INFO] penalty ` : 5
[print_and_log_partition_setting][INFO] overload : 2.000000
[print_and_log_partition_setting][INFO]
[print_and_log_partition_setting][INFO] Contig graph setup ***********
[print_and_log_partition_setting][INFO] num_feature : 3
[print_and_log_partition_setting][INFO] is_assign_best : Yes
[print_and_log_partition_setting][INFO] read_sampler_k : 200
[print_and_log_partition_setting][INFO]
[print_and_log_multi_graph_setting][INFO] Multi-graph ***********
[print_and_log_multi_graph_setting][INFO] num_parallel : 72
[print_and_log_multi_graph_setting][INFO] avail mem (byte : 0
[print_and_log_mb_setting][INFO] Multi-bridge ***********
[print_and_log_mb_setting][INFO] max_hop_path : 30
[print_and_log_sf_setting][INFO] Sparse flow ***********
[print_and_log_sf_setting][INFO] sparse flow multiple test: 8
[print_important_notice][INFO]
** [IMPORTANT MESSAGE]
[print_important_notice][INFO]
** Use existing jellyfish dict at path
/shannon/assembly/algo_input/kmer.dict
delete this file if input reads have changed.
Use jellyfish jf file at path
/shannon/assembly/algo_input/kmer.jf
delete this file if input reads have changed.
[print_important_notice][INFO] ** [IMPORTANT MESSAGE ENDS]
[test_all][INFO] finish run jellyfish
[log_stop_timer][INFO] ** using 0 hours <=> 0 minutes <=> 0 sec, mem 0 GB
[print_important_notice][INFO]
** [IMPORTANT MESSAGE]
[print_important_notice][INFO]
** LOADING: KMER dictionary can store count 0-4294967295, uses EXACT COUNT
based on Jellyfish stats
uniq_num 2411706138
dist_num 3516299200
total_num 145076211725
max_count 32598394
[print_important_notice][INFO] ** [IMPORTANT MESSAGE ENDS]
[log_stop_timer][INFO] ** using 2 hours <=> 167 minutes <=> 10045 sec, mem 0 GB
[build_dict_from_kmer_file][INFO] Start load Kmer
...
....
[build_dict_from_kmer_file_helper][INFO] processed 2667305120 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 2 minutes <=> 127 sec, mem 1 GB
[build_dict_from_kmer_file][INFO] finish kmer load 2737620000
[log_stop_timer][INFO] ** using 2 hours <=> 168 minutes <=> 10115 sec, mem 221 GB
[find_contig_external][INFO] Start find contig
[log_stop_timer][INFO] ** using 4 hours <=> 254 minutes <=> 15270 sec, mem 2 GB
[find_contig_external][INFO] [ 1%] 70336787 kmer out of 2737620000 is processed
[log_stop_timer][INFO] ** using 6 hours <=> 399 minutes <=> 23955 sec, mem 2 GB
[find_contig_external][INFO] [ 2%] 140662732 kmer out of 2737620000 is processed
[log_stop_timer][INFO] ** using 7 hours <=> 457 minutes <=> 27460 sec, mem 1 GB
[find_contig_external][INFO] [ 3%] 210988944 kmer out of 2737620000 is processed
[log_stop_timer][INFO] ** using 7 hours <=> 456 minutes <=> 27374 sec, mem 0 GB
[find_contig_external][INFO] [ 4%] 281315045 kmer out of 2737620000 is processed
[log_stop_timer][INFO] ** using 8 hours <=> 493 minutes <=> 29601 sec, mem 0 GB
[find_contig_external][INFO] [ 5%] 351642001 kmer out of 2737620000 is processed
[log_stop_timer][INFO] ** using 8 hours <=> 533 minutes <=> 32011 sec, mem 0 GB
[find_contig_external][INFO] [ 6%] 421970723 kmer out of 2737620000 is processed
[log_stop_timer][INFO] ** using 8 hours <=> 523 minutes <=> 31415 sec, mem 1 GB
[find_contig_external][INFO] [ 7%] 492297915 kmer out of 2737620000 is processed
[log_stop_timer][INFO] ** using 9 hours <=> 582 minutes <=> 34966 sec, mem 0 GB
[find_contig_external][INFO] [ 8%] 562623905 kmer out of 2737620000 is processed
[log_stop_timer][INFO] ** using 10 hours <=> 613 minutes <=> 36833 sec, mem 0 GB
[find_contig_external][INFO] [ 9%] 632950475 kmer out of 2737620000 is processed
[log_stop_timer][INFO] ** using 10 hours <=> 648 minutes <=> 38883 sec, mem 0 GB
[find_contig_external][INFO] [ 10%] 703279209 kmer out of 2737620000 is processed
The time interval used for kmer loading is more helpful to find out a possible severe collision or rehashing.
Since memory architecture and element fetching speed are machine dependent, maybe it is good if you can run some small dataset, to get a normal kmer loading/accessing speed (just use log file to compute). Then these measures can be used as references to compare with this large dataset.
[build_dict_from_kmer_file][INFO] Start load Kmer
[build_dict_from_kmer_file_helper][INFO] processed 0 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 0 minutes <=> 0 sec, mem 0 GB
[build_dict_from_kmer_file_helper][INFO] processed 70325984 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 0 minutes <=> 54 sec, mem 7 GB
[build_dict_from_kmer_file_helper][INFO] processed 140651968 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 0 minutes <=> 53 sec, mem 6 GB
[build_dict_from_kmer_file_helper][INFO] processed 210977952 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 0 minutes <=> 55 sec, mem 6 GB
[build_dict_from_kmer_file_helper][INFO] processed 281303936 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 0 minutes <=> 58 sec, mem 6 GB
[build_dict_from_kmer_file_helper][INFO] processed 351629920 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 0 minutes <=> 59 sec, mem 6 GB
[build_dict_from_kmer_file_helper][INFO] processed 421955904 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 1 minutes <=> 60 sec, mem 5 GB
[build_dict_from_kmer_file_helper][INFO] processed 492281888 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 1 minutes <=> 63 sec, mem 5 GB
[build_dict_from_kmer_file_helper][INFO] processed 562607872 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 1 minutes <=> 63 sec, mem 4 GB
[build_dict_from_kmer_file_helper][INFO] processed 632933856 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 1 minutes <=> 65 sec, mem 4 GB
[build_dict_from_kmer_file_helper][INFO] processed 703259840 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 1 minutes <=> 66 sec, mem 4 GB
[build_dict_from_kmer_file_helper][INFO] processed 773585824 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 1 minutes <=> 68 sec, mem 4 GB
[build_dict_from_kmer_file_helper][INFO] processed 843911808 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 1 minutes <=> 69 sec, mem 4 GB
[build_dict_from_kmer_file_helper][INFO] processed 914237792 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 1 minutes <=> 71 sec, mem 3 GB
[build_dict_from_kmer_file_helper][INFO] processed 984563776 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 1 minutes <=> 72 sec, mem 3 GB
[build_dict_from_kmer_file_helper][INFO] processed 1054889760 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 1 minutes <=> 74 sec, mem 3 GB
[build_dict_from_kmer_file_helper][INFO] processed 1125215744 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 1 minutes <=> 75 sec, mem 3 GB
[build_dict_from_kmer_file_helper][INFO] processed 1195541728 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 1 minutes <=> 75 sec, mem 2 GB
[build_dict_from_kmer_file_helper][INFO] processed 1265867712 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 1 minutes <=> 77 sec, mem 3 GB
[build_dict_from_kmer_file_helper][INFO] processed 1336193696 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 1 minutes <=> 79 sec, mem 2 GB
[build_dict_from_kmer_file_helper][INFO] processed 1406519680 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 1 minutes <=> 80 sec, mem 2 GB
[build_dict_from_kmer_file_helper][INFO] processed 1476845664 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 1 minutes <=> 80 sec, mem 2 GB
[build_dict_from_kmer_file_helper][INFO] processed 1547171648 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 1 minutes <=> 82 sec, mem 3 GB
[build_dict_from_kmer_file_helper][INFO] processed 1617497632 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 1 minutes <=> 83 sec, mem 2 GB
[build_dict_from_kmer_file_helper][INFO] processed 1687823616 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 1 minutes <=> 83 sec, mem 1 GB
[build_dict_from_kmer_file_helper][INFO] processed 1758149600 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 1 minutes <=> 86 sec, mem 2 GB
[build_dict_from_kmer_file_helper][INFO] processed 1828475584 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 1 minutes <=> 85 sec, mem 2 GB
[build_dict_from_kmer_file_helper][INFO] processed 1898801568 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 1 minutes <=> 86 sec, mem 2 GB
[build_dict_from_kmer_file_helper][INFO] processed 1969127552 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 1 minutes <=> 89 sec, mem 1 GB
[build_dict_from_kmer_file_helper][INFO] processed 2039453536 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 1 minutes <=> 87 sec, mem 2 GB
[build_dict_from_kmer_file_helper][INFO] processed 2109779520 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 1 minutes <=> 90 sec, mem 2 GB
[build_dict_from_kmer_file_helper][INFO] processed 2180105504 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 1 minutes <=> 89 sec, mem 1 GB
[build_dict_from_kmer_file_helper][INFO] processed 2250431488 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 1 minutes <=> 91 sec, mem 2 GB
[build_dict_from_kmer_file_helper][INFO] processed 2320757472 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 1 minutes <=> 91 sec, mem 1 GB
[build_dict_from_kmer_file_helper][INFO] processed 2391083456 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 1 minutes <=> 92 sec, mem 2 GB
[build_dict_from_kmer_file_helper][INFO] processed 2461409440 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 1 minutes <=> 93 sec, mem 1 GB
[build_dict_from_kmer_file_helper][INFO] processed 2531735424 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 1 minutes <=> 95 sec, mem 2 GB
[build_dict_from_kmer_file_helper][INFO] processed 2602061408 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 1 minutes <=> 94 sec, mem 2 GB
[build_dict_from_kmer_file_helper][INFO] processed 2672387392 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 1 minutes <=> 97 sec, mem 1 GB
[build_dict_from_kmer_file_helper][INFO] processed 2742713376 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 1 minutes <=> 96 sec, mem 2 GB
[build_dict_from_kmer_file_helper][INFO] processed 2813039360 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 1 minutes <=> 98 sec, mem 2 GB
[build_dict_from_kmer_file_helper][INFO] processed 2883365344 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 1 minutes <=> 98 sec, mem 2 GB
[build_dict_from_kmer_file_helper][INFO] processed 2953691328 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 1 minutes <=> 100 sec, mem 2 GB
[build_dict_from_kmer_file_helper][INFO] processed 3024017312 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 1 minutes <=> 101 sec, mem 2 GB
[build_dict_from_kmer_file_helper][INFO] processed 3094343296 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 1 minutes <=> 101 sec, mem 2 GB
[build_dict_from_kmer_file_helper][INFO] processed 3164669280 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 1 minutes <=> 102 sec, mem 2 GB
[build_dict_from_kmer_file_helper][INFO] processed 3234995264 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 1 minutes <=> 102 sec, mem 2 GB
[build_dict_from_kmer_file_helper][INFO] processed 3305321248 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 1 minutes <=> 104 sec, mem 3 GB
[build_dict_from_kmer_file_helper][INFO] processed 3375647232 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 1 minutes <=> 103 sec, mem 2 GB
[build_dict_from_kmer_file_helper][INFO] processed 3445973216 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 1 minutes <=> 106 sec, mem 2 GB
[build_dict_from_kmer_file_helper][INFO] processed 3516299200 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 1 minutes <=> 104 sec, mem 2 GB
[build_dict_from_kmer_file_helper][INFO] processed 3586625184 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 1 minutes <=> 107 sec, mem 2 GB
[build_dict_from_kmer_file_helper][INFO] processed 3656951168 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 1 minutes <=> 106 sec, mem 2 GB
[build_dict_from_kmer_file_helper][INFO] processed 3727277152 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 1 minutes <=> 111 sec, mem 2 GB
[build_dict_from_kmer_file_helper][INFO] processed 3797603136 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 1 minutes <=> 110 sec, mem 3 GB
[build_dict_from_kmer_file_helper][INFO] processed 3867929120 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 1 minutes <=> 112 sec, mem 2 GB
[build_dict_from_kmer_file_helper][INFO] processed 3938255104 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 1 minutes <=> 109 sec, mem 2 GB
[build_dict_from_kmer_file_helper][INFO] processed 4008581088 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 1 minutes <=> 113 sec, mem 2 GB
[build_dict_from_kmer_file_helper][INFO] processed 4078907072 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 1 minutes <=> 110 sec, mem 2 GB
[build_dict_from_kmer_file_helper][INFO] processed 4149233056 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 1 minutes <=> 111 sec, mem 2 GB
[build_dict_from_kmer_file_helper][INFO] processed 4219559040 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 1 minutes <=> 112 sec, mem 2 GB
[build_dict_from_kmer_file_helper][INFO] processed 4289885024 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 1 minutes <=> 112 sec, mem 2 GB
[build_dict_from_kmer_file_helper][INFO] processed 65243712 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 1 minutes <=> 115 sec, mem 2 GB
[build_dict_from_kmer_file_helper][INFO] processed 135569696 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 1 minutes <=> 112 sec, mem 2 GB
[build_dict_from_kmer_file_helper][INFO] processed 205895680 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 1 minutes <=> 116 sec, mem 1 GB
[build_dict_from_kmer_file_helper][INFO] processed 276221664 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 1 minutes <=> 115 sec, mem 2 GB
[build_dict_from_kmer_file_helper][INFO] processed 346547648 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 1 minutes <=> 115 sec, mem 2 GB
[build_dict_from_kmer_file_helper][INFO] processed 416873632 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 1 minutes <=> 118 sec, mem 2 GB
[build_dict_from_kmer_file_helper][INFO] processed 487199616 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 1 minutes <=> 115 sec, mem 2 GB
[build_dict_from_kmer_file_helper][INFO] processed 557525600 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 1 minutes <=> 118 sec, mem 1 GB
[build_dict_from_kmer_file_helper][INFO] processed 627851584 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 1 minutes <=> 119 sec, mem 2 GB
[build_dict_from_kmer_file_helper][INFO] processed 698177568 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 1 minutes <=> 118 sec, mem 2 GB
[build_dict_from_kmer_file_helper][INFO] processed 768503552 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 2 minutes <=> 120 sec, mem 1 GB
[build_dict_from_kmer_file_helper][INFO] processed 838829536 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 2 minutes <=> 120 sec, mem 2 GB
[build_dict_from_kmer_file_helper][INFO] processed 909155520 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 1 minutes <=> 118 sec, mem 1 GB
[build_dict_from_kmer_file_helper][INFO] processed 979481504 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 2 minutes <=> 122 sec, mem 2 GB
[build_dict_from_kmer_file_helper][INFO] processed 1049807488 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 2 minutes <=> 122 sec, mem 2 GB
[build_dict_from_kmer_file_helper][INFO] processed 1120133472 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 2 minutes <=> 120 sec, mem 1 GB
[build_dict_from_kmer_file_helper][INFO] processed 1190459456 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 2 minutes <=> 123 sec, mem 2 GB
[build_dict_from_kmer_file_helper][INFO] processed 1260785440 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 2 minutes <=> 122 sec, mem 1 GB
[build_dict_from_kmer_file_helper][INFO] processed 1331111424 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 2 minutes <=> 121 sec, mem 2 GB
[build_dict_from_kmer_file_helper][INFO] processed 1401437408 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 2 minutes <=> 123 sec, mem 1 GB
[build_dict_from_kmer_file_helper][INFO] processed 1471763392 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 2 minutes <=> 124 sec, mem 1 GB
[build_dict_from_kmer_file_helper][INFO] processed 1542089376 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 2 minutes <=> 124 sec, mem 2 GB
[build_dict_from_kmer_file_helper][INFO] processed 1612415360 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 2 minutes <=> 123 sec, mem 1 GB
[build_dict_from_kmer_file_helper][INFO] processed 1682741344 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 2 minutes <=> 126 sec, mem 2 GB
[build_dict_from_kmer_file_helper][INFO] processed 1753067328 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 2 minutes <=> 124 sec, mem 1 GB
[build_dict_from_kmer_file_helper][INFO] processed 1823393312 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 2 minutes <=> 124 sec, mem 1 GB
[build_dict_from_kmer_file_helper][INFO] processed 1893719296 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 2 minutes <=> 125 sec, mem 2 GB
[build_dict_from_kmer_file_helper][INFO] processed 1964045280 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 2 minutes <=> 127 sec, mem 1 GB
[build_dict_from_kmer_file_helper][INFO] processed 2034371264 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 2 minutes <=> 126 sec, mem 1 GB
[build_dict_from_kmer_file_helper][INFO] processed 2104697248 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 2 minutes <=> 125 sec, mem 2 GB
[build_dict_from_kmer_file_helper][INFO] processed 2175023232 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 2 minutes <=> 126 sec, mem 1 GB
[build_dict_from_kmer_file_helper][INFO] processed 2245349216 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 2 minutes <=> 127 sec, mem 2 GB
[build_dict_from_kmer_file_helper][INFO] processed 2315675200 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 2 minutes <=> 127 sec, mem 1 GB
[build_dict_from_kmer_file_helper][INFO] processed 2386001184 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 2 minutes <=> 125 sec, mem 1 GB
[build_dict_from_kmer_file_helper][INFO] processed 2456327168 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 2 minutes <=> 127 sec, mem 2 GB
[build_dict_from_kmer_file_helper][INFO] processed 2526653152 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 2 minutes <=> 127 sec, mem 1 GB
[build_dict_from_kmer_file_helper][INFO] processed 2596979136 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 2 minutes <=> 127 sec, mem 2 GB
[build_dict_from_kmer_file_helper][INFO] processed 2667305120 kmers out of 200813408
[log_stop_timer][INFO] ** using 0 hours <=> 2 minutes <=> 127 sec, mem 1 GB
[build_dict_from_kmer_file][INFO] finish kmer load 2737620000
I don't have a small dataset but could try to subsample this data when I get a chance. Any suggestions you can give on increasing the memory utilisation etc would be well received.
In your log file, it is likely that there is a collision issue, for example, initially the loading speed is 1302333 kmer/sec, and in the end it is around 553747kmer/sec. The hashmap we use comes from https://github.com/greg7mdp/sparsepp. It uses open-address to deal with collision.
There are two approaches for trying to solve the issue:
then, compile according to wiki installation instruction.
On my part, I found a new hash implementation by the same author as sparsepp. I will test it maybe before midst of April.
Thank you for the suggestions. I tried the --load_factor as 0.2 and whilst the memory usage increased, there was no visible speed gain (I terminated the process after ~18 hours). Re trying with a much smaller value of 0.05 to see if it makes a dent.. if not, I might re-try in April when you have implemented the different hash.
Cheers, MD
I have replaced dense_hash_map with a new one, parallel hash map. However, after testing two large datasets with the new hashmap, the performance on loading and connecting kmers are worse, like 1564sec vs. 2922sec on loading, 2500sec vs. 2959sec on connecting. I don't think it is the right path to solve the problem, a better approach would be to use a multi-thread kmer connecting algorithm, or do assemblies on a set of small data and merge the results.
Hi Folks,
Just checking in to see if extremely long runtimes are expected in the connection stage, I currently have a process that has been running for over 3 days! At this rate, it will be weeks before I will see an assembly.
Is there a way to multithread this step to make it faster? It seems to be running on a single core at the moment.
parameters used for the run were: -u 72 -m 700G -g 200 -t 72