Open tahorst opened 4 years ago
Yeah, it failed to allocate memory which means the python process hit some kind of memory limit. Was this on sherlock? Do you know what the memory limit for processes is there?
Is this an intermittent issue, or does it happen the same way each time?
Was this on sherlock?
Yes, the Jenkins builds are on Sherlock.
Do you know what the memory limit for processes is there?
It's currently 48 GB shared between all running jobs.
Is this an intermittent issue, or does it happen the same way each time?
This is the first failure I've seen so I was hoping you could explore to confirm it's reproducible and identify the problem.
Sure, I'll run it outside of sherlock and see if it fails in the same way. Though based on the error this is a memory failure and arrow just happened to be the one trying to allocate memory when it hit the limit. If the 48GB are shared then I expect this error to be transient, unless it is hitting a separate per-process limit in which case it should fail at the same place each time until the memory is raised.
It's terrific that the native code handled the allocation error gracefully and got the right error information out! That makes it easy to look into getting more memory for the process or optimizing the code to reduce memory usage.
Two details to further help: Cause a MemoryError
rather than a TypeError
and at the base of the stack catch any MemoryError
and print additional memory stats like the total amount of process memory in use.
Status on this: it is a similar failure to https://github.com/CovertLab/arrow/issues/39, the repeated multiplication of large numbers is overflowing the 64-bit floating point register. This happens when there are large numbers of simultaneous elements in the stoichiometry for a reaction that also has large counts. A few possible improvements have come up in conversations with @tahorst:
This is happening on generation 2 of seed 6691 if anyone wants to replicate. As code changes, this condition may no longer trigger. We haven't seen this with any other seed/generation combinations so far, but they are far from exhaustively tested.
@prismofeverything mind investigating why arrow failed?
Git hash: 9e9f3bb066
Command:
Trace: