CompEvol / beast2

Bayesian Evolutionary Analysis by Sampling Trees
www.beast2.org
GNU Lesser General Public License v2.1
240 stars 84 forks source link

Trie in State can lead to memory leak #985

Open rbouckaert opened 3 years ago

rbouckaert commented 3 years ago

The trie in the State is a data structure that stores paths through the calculation graph based on which set of StateNodes are changed by operators. Some models have operators that select random subsets of StateNodes, leading to uncontrolled growth of the trie. A crude solution used in StartBeast2 is to mark all StateNodes (that are gene trees) as dirty, so only a single entry in the trie is created, at the cost of having the state and calculation nodes doing more work checking whether things should be recalculated.

Possible solutions:

The first two require no developer knowledge, but are possibly less efficient than the last solution. Irrespective of implementing the 3rd solution, the 2nd seems a good way to limit the amount of memory gobbled up by the trie.

tgvaughan commented 3 years ago

Hi Remco, just wanted to chime in with a vote for any solution which avoids affecting analyses which don't include such operators. It'd be a shame to slow things down even marginally for the sake of this edge case; your trie works really well everywhere else!

rbouckaert commented 3 years ago

Hi Tim, I agree. From what I've seen, most operators apply to a single StateNode, and there will only be a single path associated with such single StateNode update. This means there are typically only a limited number of paths. I set the upper limit to 1000 for now. Do you think there are analyses that require more than a 1000 unique calculation node paths? What makes a reasonable upper bound?