NAVADMC / ADSM

A simulation of disease spread in livestock populations. Includes detection and containment simulation.
Other
10 stars 5 forks source link

CEngine Out Of Memory #960

Closed BryanHurst closed 4 years ago

BryanHurst commented 5 years ago

We are having some more issues with the CEngine throwing an out of memory error while hardly using any system memory.

A scenario displaying this issue will be shared in the Drive soon.

BryanHurst commented 5 years ago

In the Google Drive folder for this issue, there is a video from Missy showing low system memory usage followed by the out of memory crash.

The scenario causing this crash is also in that folder.

BryanHurst commented 4 years ago

@missyschoenbaum I ran the simulation in the 960 folder and it didn't crash for me.

Is there a number that when increased will generally cause this issue to pop up?

BryanHurst commented 4 years ago

Never mind, this scenario seems to fail 1 out of 3 times for me, so I can replicate the issue.

missyschoenbaum commented 4 years ago

@BryanHurst sorry I missed this. Just for future reference, going into disease spread (any or all) and make contact rate higher, and infectiousness higher.

BryanHurst commented 4 years ago

Since this is an intermittent issue, it seems related to how many units get infected.

For reference, here is the error being thrown: adsm_simulation[18904]: GSlice: assertion failed: sinfo->n_allocated > 0

BryanHurst commented 4 years ago

@missyschoenbaum I think we may have this one sorted out now. We'll need to run a pretty good test suite on this since several dependencies in the CEngine were upgraded, it might have some side effects I didn't see in Sample Scenario or the Texas one.

missyschoenbaum commented 4 years ago

Will get on it.

missyschoenbaum commented 4 years ago

I am still getting same error with no visible response in memory. Video shows right after failure, but the Task Manager has preserved the last 60 seconds. Will text @ConradSelig the video to post.

ConradSelig commented 4 years ago

Here is the link: https://drive.google.com/file/d/1asP1Pn1sopFyHSIugphYwzkMC5y68TOz/view?usp=sharing

BryanHurst commented 4 years ago

@missyschoenbaum can you confirm that your installation properly upgraded to 3.5.10.14 and that there isn't an integrity error?

BryanHurst commented 4 years ago

Okay, I was able to get the memory error to trigger after 50 iterations, so this does still seem to be a problem

missyschoenbaum commented 4 years ago

@BryanHurst Mine was going on 4th. Also, I realize in the real world there might be times someone does run out of memory. I just expected that you see some increased use before it stops. Should we just let it go?

BryanHurst commented 4 years ago

The out of memory issue isn't the system being out of memory, it is an array in the C code that uses more space that was originally allotted to it.

We thought we had it sorted out, but it may need another pass. I am concerned that this one will become a real issue once used more widely.

missyschoenbaum commented 4 years ago

Thanks for explaining. I don't know that part of things. I am running 3.5.10.14, but I have an integrity error. Should I upgrade again?

BryanHurst commented 4 years ago

It'd be good to upgrade again, but I was able to confirm that this still seems to be an issue (maybe less frequently?).

missyschoenbaum commented 4 years ago

Would it help to have my stackdump?

BryanHurst commented 4 years ago

No, the stackdump isn't very useful at this stage.

BryanHurst commented 4 years ago

We'd like to test this on some normal/slightly worse scenarios and see if we error out, or if our unrealistic Texas is the only one causing this.

missyschoenbaum commented 4 years ago

I ran this in a couple of variations. My next plan is to write known bug to address it, and show the memory error screen.

missyschoenbaum commented 4 years ago

Bug posted.