CovertLab / arrow

Stochastic simulations in python
MIT License
3 stars 1 forks source link

Arrow hangs wcEcoli complexation with certain sim seeds #48

Closed 1fish2 closed 2 years ago

1fish2 commented 2 years ago

Some wcEcoli sims can hang during complexation.

See https://github.com/CovertLab/wcEcoli/issues/1229 including @tahorst's boiled down test case.

prismofeverything commented 2 years ago

Hey @1fish2! I can't see the test case or the linked issue (private repo, ha) but this has come up before, if I recall during flagellar complexation. Gillespie is prone to explode under certain conditions if the exponent term in the choice calculation is too large.... the solution is to find the offending reaction and decompose the stoichiometry into an equivalent problem with more steps (I think the flagella had something like 170 identical subunits which is what was causing the problem, breaking it into two+ equivalent reactions fixed it).

Beyond that, adding something to actually catch this error when/before it happens would be helpful. I thought we did that at some point but maybe not, it's been awhile.

1fish2 commented 2 years ago

I copied @tahorst's test case into this repo, making it a minimal unit test.

This needs debugging. The cause might be the Gillespie algorithm blowup.

1fish2 commented 2 years ago

Note: make clean compile prints an unexpected warning

building 'arrow.arrowhead' extension
Warning: Can't read registry to find the necessary compiler setting
Make sure that Python modules winreg, win32api or win32con are installed.
C compiler: clang -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -I/Library/Developer/CommandLineTools/SDKs/MacOSX10.14.sdk/usr/include -I/Library/Developer/CommandLineTools/SDKs/MacOSX10.14.sdk/usr/include

Note: test_flagella also fails: arrow/arrow.py:176: SimulationFailure.

tahorst commented 2 years ago

I have doubts this is due to overflow (or at least the overflow issue identified in #39) since the symptoms are different from what happened there. Also, in wcEcoli, we run arrow twice (once on all possible molecules and a second time on a reduced set) and this issue happens with the reduced set after the larger set already completed successfully so propensities should be the same or smaller than the first time it was run. From #39, it seems like the overflow will happen in the propensity calculations which should be the same regardless of the random seed for arrow but this issue only pops up with certain random states. Or is it possible the overflow is seed dependent?

If it is overflow and/or negative counts, then maybe the algorithm keeps selecting the negative counts and drives them even more negative in an infinite loop that would slowly eat up memory as more events are recorded.

prismofeverything commented 2 years ago

Ah yeah, negative counts are the other failure mode.... I thought we dealt with this before? but maybe there is still some lurking corner case that's getting triggered in this point. I've been able to debug these issues with simple print statements in the C before, but you may have more success with gdb in this case (still the GOAT): https://users.ece.utexas.edu/~adnan/gdb-refcard.pdf

1fish2 commented 2 years ago

As Travis found, the bug occurred when the random value point == 0, then the loop would select the first reaction even if its propensity was 0.

PR #49 includes a unit test, the bug fix, and additional robustness checks.