Open rbouckaert opened 3 years ago
Hi Remco, just wanted to chime in with a vote for any solution which avoids affecting analyses which don't include such operators. It'd be a shame to slow things down even marginally for the sake of this edge case; your trie works really well everywhere else!
Hi Tim, I agree. From what I've seen, most operators apply to a single StateNode
, and there will only be a single path associated with such single StateNode
update. This means there are typically only a limited number of paths. I set the upper limit to 1000 for now. Do you think there are analyses that require more than a 1000 unique calculation node paths? What makes a reasonable upper bound?
The trie in the
State
is a data structure that stores paths through the calculation graph based on which set ofStateNode
s are changed by operators. Some models have operators that select random subsets ofStateNode
s, leading to uncontrolled growth of the trie. A crude solution used in StartBeast2 is to mark allStateNode
s (that are gene trees) as dirty, so only a single entry in the trie is created, at the cost of having the state and calculation nodes doing more work checking whether things should be recalculated.Possible solutions:
The first two require no developer knowledge, but are possibly less efficient than the last solution. Irrespective of implementing the 3rd solution, the 2nd seems a good way to limit the amount of memory gobbled up by the trie.