dzerbino / velvet

Short read de novo assembler using de Bruijn graphs, as published in: D.R. Zerbino and E. Birney. 2008. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Research, 18: 821-829
https://europepmc.org/article/pmc/2336801
GNU General Public License v2.0
281 stars 100 forks source link

Memory scaling #45

Closed gmfricke closed 6 years ago

gmfricke commented 6 years ago

Hello,

We are using velvet optimiser to search for the right kmer length. We have 132 individuals with some 3 million reads each.

Does the memory usage of velvet increase linearly with the number of individuals (assuming similar numbers of reads per individual)?

The proximate error we are seeing is: "Velvetg on hash value: 73 finished. velvetg: Can't calloc 18446744072184269754 KmerOccurences totalling 18446743982192639896 bytes: Cannot allocate memory"

We are working on a 3TB RAM machine. Can you advise us on how large a dataset we should be able to process? If it takes 10 days of compute time we would rather not use too much trial and error.

Thank you so much!

Matthew

Context:

Velveth start hash values: 65 Velveth end hash value: 75 Velveth hash step value: 2 Velvetg minimum coverage cutoff to use: 0

Read tracking for final assembly on. Sep 26 10:27:58 Beginning velveth runs. Logfile name: 26-09-2018-10-27-58_Logfile.txt Sep 26 10:27:58 Running velveth with hash value: 65. Sep 26 10:28:00 Running velveth with hash value: 67. Sep 26 10:28:02 Running velveth with hash value: 69. Sep 26 10:28:04 Running velveth with hash value: 71. Sep 26 10:28:06 Running velveth with hash value: 73. Sep 26 10:28:08 Running velveth with hash value: 75. Sep 27 02:16:08 Velveth with hash value 65 finished. Sep 27 11:07:46 Velveth with hash value 71 finished. Sep 27 12:48:02 Velveth with hash value 67 finished. Sep 27 14:21:04 Velveth with hash value 75 finished. Sep 27 14:25:31 Velveth with hash value 73 finished. Sep 27 14:25:49 Velveth with hash value 69 finished. Sep 27 14:25:49 Finished velveth runs. Sep 27 14:25:49 Beginning vanilla velvetg runs. Sep 27 14:25:49 Running vanilla velvetg on hash value: 65 Sep 27 14:25:51 Running vanilla velvetg on hash value: 67 Sep 27 14:25:53 Running vanilla velvetg on hash value: 69 Sep 27 14:25:55 Running vanilla velvetg on hash value: 71 Sep 27 14:25:57 Running vanilla velvetg on hash value: 73 Sep 27 14:25:59 Running vanilla velvetg on hash value: 75 Sep 27 15:36:25 Velvetg on hash value: 71 finished. velvetg: Can't calloc 18446744071675041097 KmerOccurences totalling 18446743951638920476 bytes: Cannot allocate memory Sep 27 18:32:57 Velvetg on hash value: 75 finished. velvetg: Can't calloc 18446744072078346239 KmerOccurences totalling 18446743975837228996 bytes: Cannot allocate memory Sep 27 18:47:43 Velvetg on hash value: 69 finished. velvetg: Can't calloc 18446744071842486963 KmerOccurences totalling 18446743961685672436 bytes: Cannot allocate memory Sep 27 18:51:20 Velvetg on hash value: 73 finished. velvetg: Can't calloc 18446744072184269754 KmerOccurences totalling 18446743982192639896 bytes: Cannot allocate memory Sep 27 19:37:30 Velvetg on hash value: 67 finished.

dzerbino commented 6 years ago

Hello @gmfricke,

We would indeed expect memory consumption to be linear.

The error messages you provided suggest an overflow error that produced these ridiculous number.

You should try recompling with the option: make ’VBIGASSEMBLY=1’

HTH,

Daniel

gmfricke commented 6 years ago

Thank you so much for the fast reply. That is helpful.