NCBI-Hackathons / NovoGraph

NovoGraph: building whole genome graphs from long-read-based de novo assemblies
MIT License
44 stars 8 forks source link

CRAM2VCF.cpp needs a lot of memory #23

Open TorHou opened 4 years ago

TorHou commented 4 years ago

When running launch_CRAM2VCF_C++.pl I have noticed that running 10 processes in parallel while(scalar(keys %still_running) >= 10)will use up a lot of mermory. Getting a job killed with 1024 Gbyte of RAM.

So I had to change it to while(scalar(keys %still_running) >= 2) in code.

evanbiederstedt commented 4 years ago

Hi @TorHou

When running launch_CRAM2VCF_C++.pl I have noticed that running 10 processes in parallel while(scalar(keys %still_running) >= 10)will use up a lot of mermory. Getting a job killed with 1024 Gbyte of RAM.

So I had to change it to while(scalar(keys %still_running) >= 2) in code.

Yes, you're referring to lines: https://github.com/NCBI-Hackathons/NovoGraph/blob/master/scripts/launch_CRAM2VCF_C%2B%2B.pl#L102-L116

There's certainly no reason why this couldn't be a parameter in the script.

it might be a good idea to look at why CRAM2VCF.cpp needs so much memory and if this can be changed

This isn't too surprising, if I recall. I agree though---the question to my mind is how easily things could be improved in terms of , and if it's worth the investment. (Not the most helpful comment I know, but I'll have to take a closer look at the *cpp code).