Genomics-HSE / VGsim

VGsim is a fast and scalable simulator of viral genealogies during global pandemic.
GNU General Public License v3.0
15 stars 8 forks source link

Simulation gets killed #47

Open Captain-Blackstone opened 2 years ago

Captain-Blackstone commented 2 years ago

I've tried to run this on two computers, one had enormous amount of REM (26 GB of unused by other processes), another only moderate (6 GB). The simulation gets killed on the second one almost instantly. On the second one it runs all the way, but then gets killed at the end when writing data to the files. Which is very surprising to me, since it says that only 7139 samples were sampled. I attach the code with the simulation. Two questions: 1) What parameters of the simulation get it killed on the low-memory computer? 2) Why a simulation with number of samples = 7139 gets killed when writing data to files?

import VGsim

default_popsizes = [268555392, 3859820011, 141557911, 687201725, 217154786, 369140457, 438663234, 364736341]
number_of_sites = 1
number_of_susceptible_groups = 1
number_of_populations = 8
simulator = VGsim.Simulator(number_of_sites,
                            number_of_populations,
                            number_of_susceptible_groups,
                            seed=12235)

for i, popsize in enumerate(default_popsizes):
    simulator.set_population_size(popsize, i)

simulator.set_transmission_rate(0.25)
simulator.set_recovery_rate(0.009995268138801262)
simulator.set_sampling_rate(0.00001)

mutation_rate= 3e-06
substitution_weights=[1,1,1,1] #ATCG
simulator.set_mutation_rate(mutation_rate, substitution_weights)

simulator.set_migration_probability(10/365)

simulator.simulate(180000000)
simulator.genealogy()

simulator.output_migrations("migrations")
simulator.output_newick("tree")

P.S. I think I notice that regardless of the sample size the simulation gets killed on the stage of writing the output to files if it was long enough. So, a very long simulation with very small sampling rate (and very few samples as a result) will get killed, I noticed. I've been told it's not an expected behavior.

niktoris1 commented 2 years ago

Tried to reproduce the problem on mac and succeeded. The workaround, for now, would be to specify the output file path through the file_path parameter of output functions (just added it, not yet in the documentation). You can do it like that. Putting the output somewhere not in the project path seemingly does the trick.

simulator.output_migrations("migrations", file_path='/Users/LAB-SCG-125/Documents/vgtest')
simulator.output_newick("tree", file_path='/Users/LAB-SCG-125/Documents/vgtest')

Although the reason for such behavior is unclear for now, we will try to fix it.

ev-br commented 2 years ago

Can someone run this under a memory_profiler, to check where the memory bottleneck is

Captain-Blackstone commented 2 years ago

Tried to reproduce the problem on mac and succeeded. The workaround, for now, would be to specify the output file path through the file_path parameter of output functions (just added it, not yet in the documentation). You can do it like that. Putting the output somewhere not in the project path seemingly does the trick.

simulator.output_migrations("migrations", file_path='/Users/LAB-SCG-125/Documents/vgtest')
simulator.output_newick("tree", file_path='/Users/LAB-SCG-125/Documents/vgtest')

Although the reason for such behavior is unclear for now, we will try to fix it.

The workaround did not work for me, unfortunately.

niktoris1 commented 2 years ago

@Captain-Blackstone, same error?

Captain-Blackstone commented 2 years ago

@niktoris1 yes, gets killed at the same stage, to be precise.