Open plattsad opened 3 years ago
The time and memory per genome should decrease for each successive genome, for whatever that's worth.
In general, I've mostly tested this with alignments between human genomes where the memory usage is bad, but much better than what you're describing (~1 hour / couple hundred gigs for nearly 100 human chr1's). I'd like to make it more efficient, but don't have any immediate plans.
What kind of data are you running on? If you have a bunch of very diverse genomes, it won't only take a ton of RAM, but your output graph will be so fragmented that I'm not sure what it could be used for.
Thanks Glenn. Yup, maybe I was just being too optimistic here - even the maf export is pretty fragmented. This is a an alignment across plants with maybe a 100MY mrca. So probably too much divergence to be useful. I killed the process as it passed 330GB and will look to focus more on lineages.
Hi all,
Is there a way to cut back the memory usage of hal2vg a bit? I have a 9.7GBase hal from a 32 way alignment of small to moderate (150-800MBase) genomes. Memory usage during the conversion process seems to gradually increase and after 4 hours I'm up to the pinching of the 4th leaf node genome and we're heading past 280GB real RAM used. I'd hoped this would complete on a 400GB server, but now I'm having doubts as to whether it will complete on a 1TB server.
Command line is below - I've tried with different chop values (32,10000) to see if this changed anything in the memory usage - it doesn't seem to.
./hal2vg ./P32Out.hal --hdf5InMemory --chop 10000 --noAncestors --progress > p32.pg
Thanks!