evaneill / PandemiCpp

C++ implementation of pandemic board game for testing MCTS agents
2 stars 0 forks source link

Exceedingly rare floating point exceptions #38

Closed evaneill closed 4 years ago

evaneill commented 4 years ago

In about every ~200-300 games played by agents (both times it's happened a 3-determinization agent), the experiment fails with a floating point exception. This must occur in exceedingly rare circumstances (that's one floating point exception in >40 million game simulations), but is obviously hard to track down.

For now experiments on my computer will be done in VScode debug mode (inefficient but w/e)

evaneill commented 4 years ago

Also had a Segmentation fault appear for the first time during a SingleSampleNaive50kUCTAgentExperiment.

evaneill commented 4 years ago

I got I believe a segmentation fault error during debug run and believe it was a segmentation fault? Error seemed to have to do with the fact that on a call to player get_position() during the execution of Move (which had a to argument that was nonsense), a copy of the player position was being given (as is designed), but the current player position was also nonsense, including that it had 0 neighbors. Something about trying to create the copy to return messed up the system.

I think I'm going to finally get around to changing the player position to just a number, kind of like with hand just being numbers, and using the Map logic to get names, populations, etc. This will hopefully skirt around having to actually investigate this, and as a bonus probably make for a small optimization since I won't have to be copying many cities.

If anything the problem is that somewhere a city is being deleted! Bad bad bad bad bad bad

sean9keenan commented 4 years ago

Not looking at the code, could it be that the object doesn’t initialize the values on construction? Probably not if it’s with debug flags, but if -O anything is there, could be related. Just food for thought, also going to bed πŸ˜“

On Thu, Jun 25, 2020 at 3:41 AM Evan Neill notifications@github.com wrote:

I got I believe a segmentation fault error during debug run and believe it was a segmentation fault? Error seemed to have to do with the fact that on a call to player get_position() during the execution of Move (which had a to argument that was nonsense), a copy of the player position was being given (as is designed), but the current player position was also nonsense, including that it had 0 neighbors. Something about trying to create the copy to return messed up the system.

I think I'm going to finally get around to changing the player position to just a number, kind of like with hand just being numbers, and using the Map logic to get names, populations, etc. This will hopefully skirt around having to actually investigate this, and as a bonus probably make for a small optimization since I won't have to be copying many cities.

If anything the problem is that somewhere a city is being deleted! Bad bad bad bad bad bad

β€” You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/evaneill/PandemiCpp/issues/38#issuecomment-649460020, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQCPXBLL6X4UPIFS447XLDRYMSVXANCNFSM4OCTDJNA .

evaneill commented 4 years ago

Unfortunately this change either didn't fix the problem or fixed a different problem. Got Segmentation Fault. Good news is that its apparently a modest optimization

evaneill commented 4 years ago

I realize now it could actually be due to random memory spikes I've observed. Though I don't see evidence that the machines I'm using are really memory constrained. I've seen very short spikes of about ~1G before returning to ~30-40Mb

evaneill commented 4 years ago

deep insight: maybe something to do with Map::City* attribute on board research_stations attribute.

evaneill commented 4 years ago

After changing research stations I haven't ran into the floating pt exception, though I also haven't been testing rollout-based agents. I suspect that the random_action_bygroup() had something to do with the exception, if it hasn't been fixed.

evaneill commented 4 years ago

After changing research stations I haven't ran into the floating pt exception, though I also haven't been testing rollout-based agents. I suspect that the random_action_bygroup() had something to do with the exception, if it hasn't been fixed.

Confirmed still happening with rollout-based agents. Got a segmentation fault AND floating point error one after the other!

evaneill commented 4 years ago

So I might have accidentally fixed this, but time will tell.

I ran into a bug after changing how an empty player deck affects play, and after messing up the implementation got a segmentation fault and floating point exception error. I replaced the behavior with a new stochastic action for player deck draws that just moves the game into a losing state without trying to draw anything from the deck or trying to call a nullptr action. In my non-rollout-agent testing this has removed the bad behavior

🀞

evaneill commented 4 years ago

So I might have accidentally fixed this, but time will tell.

I ran into a bug after changing how an empty player deck affects play, and after messing up the implementation got a segmentation fault and floating point exception error. I replaced the behavior with a new stochastic action for player deck draws that just moves the game into a losing state without trying to draw anything from the deck or trying to call a nullptr action. In my non-rollout-agent testing this has removed the bad behavior

🀞

Forgot to mention this would line up with how rare the error was. It was rare because rollout agents would practically NEVER get to a part of the game where their tree would even see the player deck being empty. I suspect this is when the errors were occurring.

evaneill commented 4 years ago

So I might have accidentally fixed this, but time will tell. I ran into a bug after changing how an empty player deck affects play, and after messing up the implementation got a segmentation fault and floating point exception error. I replaced the behavior with a new stochastic action for player deck draws that just moves the game into a losing state without trying to draw anything from the deck or trying to call a nullptr action. In my non-rollout-agent testing this has removed the bad behavior 🀞

Forgot to mention this would line up with how rare the error was. It was rare because rollout agents would practically NEVER get to a part of the game where their tree would even see the player deck being empty. I suspect this is when the errors were occurring.

Very brave of you to assume this would fix it. I can't believe how ignorant past me was. It didn't. Got a floating point exception on another rollout agent. FWIW none on the heuristic-evaluated agents.

evaneill commented 4 years ago

So again I've collected the hubris to propose that I might have fixed it. In deck draws there's a fantastically small chance (1 in 25 billion per simulation with a remaining epidemic card in my system config for RAND_MAX) that it won't be drawn in the part of the deck that it's supposed to.

Because if inequality is (<) being used rather than inclusion with equality (<=), then in a vanishingly small number of cases the epidemic (a) won't be drawn for all but the last card, and then (b) draws exactly RAND_MAX and so doesn't get drawn. Inevitably after this point the deck will run into floating point error behavior.

evaneill commented 4 years ago

I'm going to tentatively close this. After applying the random number fix, I was able to run 100% of my rollout agent experiments without a single floating point exception, which would have been unheard of before.

🀞 🀞 🀞 🀞 🀞 🀞 🀞 🀞