Closed 1fish2 closed 2 years ago
Hey @1fish2! I can't see the test case or the linked issue (private repo, ha) but this has come up before, if I recall during flagellar complexation. Gillespie is prone to explode under certain conditions if the exponent term in the choice calculation is too large.... the solution is to find the offending reaction and decompose the stoichiometry into an equivalent problem with more steps (I think the flagella had something like 170 identical subunits which is what was causing the problem, breaking it into two+ equivalent reactions fixed it).
Beyond that, adding something to actually catch this error when/before it happens would be helpful. I thought we did that at some point but maybe not, it's been awhile.
I copied @tahorst's test case into this repo, making it a minimal unit test.
This needs debugging. The cause might be the Gillespie algorithm blowup.
Note: make clean compile
prints an unexpected warning
building 'arrow.arrowhead' extension
Warning: Can't read registry to find the necessary compiler setting
Make sure that Python modules winreg, win32api or win32con are installed.
C compiler: clang -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -I/Library/Developer/CommandLineTools/SDKs/MacOSX10.14.sdk/usr/include -I/Library/Developer/CommandLineTools/SDKs/MacOSX10.14.sdk/usr/include
Note: test_flagella
also fails: arrow/arrow.py:176: SimulationFailure
.
I have doubts this is due to overflow (or at least the overflow issue identified in #39) since the symptoms are different from what happened there. Also, in wcEcoli, we run arrow twice (once on all possible molecules and a second time on a reduced set) and this issue happens with the reduced set after the larger set already completed successfully so propensities should be the same or smaller than the first time it was run. From #39, it seems like the overflow will happen in the propensity calculations which should be the same regardless of the random seed for arrow but this issue only pops up with certain random states. Or is it possible the overflow is seed dependent?
If it is overflow and/or negative counts, then maybe the algorithm keeps selecting the negative counts and drives them even more negative in an infinite loop that would slowly eat up memory as more events are recorded.
Ah yeah, negative counts are the other failure mode.... I thought we dealt with this before? but maybe there is still some lurking corner case that's getting triggered in this point. I've been able to debug these issues with simple print statements in the C before, but you may have more success with gdb
in this case (still the GOAT): https://users.ece.utexas.edu/~adnan/gdb-refcard.pdf
As Travis found, the bug occurred when the random value point == 0
, then the loop would select the first reaction even if its propensity was 0.
PR #49 includes a unit test, the bug fix, and additional robustness checks.
Some wcEcoli sims can hang during complexation.
See https://github.com/CovertLab/wcEcoli/issues/1229 including @tahorst's boiled down test case.