dzerbino / velvet

Short read de novo assembler using de Bruijn graphs, as published in: D.R. Zerbino and E. Birney. 2008. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Research, 18: 821-829
https://europepmc.org/article/pmc/2336801
GNU General Public License v2.0
278 stars 99 forks source link

Could you help me ? I'm confused with velvet which need a large physical memory. #53

Open jzhou65 opened 3 years ago

jzhou65 commented 3 years ago

$ velvetg output -exp_cov auto [0.000001] Reading read set file output/Sequences; [43.817688] 88015946 sequences found [110.015839] Done [218.490050] Reading pre-graph file output/PreGraph [218.490744] Graph has 19624790 nodes and 88015946 sequences [233.340736] Scanning pre-graph file output/PreGraph for k-mers [236.788591] 123729624 kmers found [250.566077] Sorting kmer occurence table ... [335.348217] Sorting done. [335.348450] Computing acceleration table... [336.150998] Computing offsets... [337.488936] Ghost Threading through reads 0 / 88015946

this required too large physical memory, I would like to know that is this right? my raw data is 28G.

dzerbino commented 3 years ago

Dear @jzhou65 ,

It's hard to tell without more information, but Velvet can be rather greedy with memory. How much memory do you have?

Best regards,

Daniel

jzhou65 commented 3 years ago

Dear @jzhou65 ,

It's hard to tell without more information, but Velvet can be rather greedy with memory. How much memory do you have?

Best regards,

Daniel

hi, Daniel. I only used two codes as follows:

velveth output/ 17 \ -shortPaired -fastq -separate HJY_2020_Clean_Data1.fq HJY_2020_Clean_Data2.fq

velvetg output -exp_cov auto -cov_cutoff auto \ -shortMatePaired3 yes -shortMatePaired4 yes \ -clean yes -scaffolding yes -amos_file yes

the server has 384G physical memory and I used swap to add 384G Virtual memory, but the second code may be require larger memory, so I don't know how to treat with it. Thanks a lot.

dzerbino commented 3 years ago

Dear @jzhou65 ,

Could you please tell me more about your data? What is the size of your genome and the coverage?

Regards,

Daniel