Memory consumption reaches 126GB with provided Covid genomes in samples25.fasta

waltergallegog commented 2 years ago

Hello, I'm trying to run the linearTurboFold with the provided file samples25.fasta as input. However, the process is killed by the kernel as the memory is exhausted. System: Linux Machine, Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz, 128 GB of memory.

From what I can see in the paper: LinearTurboFold takes about 13h and 43 GB on the 25 genomes.

The command I'm using is simply:

./linearturbofold -i data/sars-cov-2_data/samples25.fasta -o data/sars-cov-2_data/results/

Is it normal that the memory consumption reaches 126 GB?

Here is a graph of the mem and cpu consumption. After around 49 minutes, the limit is reached: memLTF25

I ran linearTurboFold again, using a .fasta file with only the first 2 sequences. This time the limit was reached faster: memLTF2

Here are files with CPU and MEM data in case they can be useful. memLTF25.txt memLTF2.txt

Perhaps I'm doing something wrong as the memory consumption seems to high. Any help is highly appreciated. Thanks Walter

LinearFold commented 2 years ago

Hello Walter,

Thanks for running our project. In our paper, we were using beam size 100 in LinearAlignment for all experiments including SARS-CoV-2. Currently, the default setting is using infinite beam size. So you can try the following command:

./linearturbofold -i data/sars-cov-2_data/samples25.fasta -o data/sars-cov-2_data/results/ --b1 100

Please let me know if you have other questions.

Best, Sizhen

waltergallegog commented 2 years ago

Hello, Got It, I will try with 100. I was under the impression that it was already 100, judging from the README

--b1 The beam size for LinearAlignment (default 100, set 0 for infinite beam).
--b2 The beam size for LinearPartition (default 100, set 0 for infinite beam).

Should I also set b2, or is b2 default 100 ?

Thanks Walter

sizhen commented 2 years ago

Sorry for misleading you.

The default value of b2 is 100. BTW, I just set b1 with default value 100 in the project. You can pull the project again to simplify the command if you want.

Thanks, Sizhen

waltergallegog commented 2 years ago

No problem at all and thanks for the quick feedback. I have just checked and indeed with b1 100 I'm able to compute the alignment with 2 sequences. I'll try the full 25 genome as well.

LinearFold / LinearTurboFold

Memory consumption reaches 126GB with provided Covid genomes in samples25.fasta #5