MegaMek / megamek

MegaMek is a networked Java clone of BattleTech, a turn-based sci-fi boardgame for 2+ players. Fight using giant robots, tanks, and/or infantry on a hex-based map.
http://www.megamek.org
GNU General Public License v2.0
296 stars 286 forks source link

49.20 MM Standalone Build - NPE Freeze. Kicking and Replacing Bots does not restart action. #5417

Closed Thom293 closed 5 months ago

Thom293 commented 5 months ago

Using this specific build https://github.com/MegaMek/megamek/actions/runs/8825771894

was doing bot v bot tests to test the new ammo code. Twice got an NPE that effectively ended the game. Froze action. Tried to kick and replace bots. No joy. Had to close. Was able to load autosave and proceed for a few more turns, but it froze again. The map is 2x3 Rolling Hills 2. No water on the map at all.

image

A save and log attached. Was the same error message both times in log.

megamek - Copy.log Round-7-autosave_2024-04-24_22-47-29.sav.gz Customs41424.zip

Thom293 commented 5 months ago

Unsure if related, but these are Brawler mechs set on Berserker v2. They should be heading straight at the enemy. No double blind or anything on.

Instead they are hanging at back of map a milling around, like they cant see the enemy or something, but there is no weather or other conditions to prevent vision. This save is the lobby. Should work with customs above. Can set them all to Berserk v2 and hit done and it should repeat.

image AMMOTEST1AUTO.sav.gz

HoneySkull commented 5 months ago

image

Hit this in a story arch game against the bot using the code from the .20-snapshot on 2024.04.25.

Note this was triggered from MekQH launched megamek session.

Thom293 commented 5 months ago

Ive run about 20 games on this build now and it has happened in all but one that ended on round 9 due to some lucky crits. Had it happen on the rest of them though.

You can workaround by loading an autosave and it will continue, but will eventually hit the NPE again.

HoneySkull commented 5 months ago

NOTE: The NPE I logged above is with a build that include the merge from #5418.

HoneySkull commented 5 months ago

image

Put in some debug code to look at the MovePath when this happens. Looks like it's going off the board??
Hopefully I can get some more samples.

This can be fixed by doing some null checks in this code -but I'm more interested in how this null coordinate condition can come to be. It could be that the list of moves might be being modified by a concurrent thread.

Sleet01 commented 5 months ago

@HoneySkull If you have time, can you check out aace7aa0ae2fa27ac2b7b34ff3cd1730dd2a7358 and see if the same issue occurs?

HoneySkull commented 5 months ago

@Sleet01 - yes my current dev state includes this pull request:

image

Got this NPE even with this change merged in:

HoneySkull commented 5 months ago

image

It's an interesting note that COORD_SET_LOCK is null here. This is on Object that gets initialized when the MovePath object is instantiated - it's also an object used to synchronize the update of coordsSet which is a SET of Coord object that contain all the hexes of the path.

The reason this is interesting, is because I put a try/catch around the original spot that was thrown and in the catch section, I put some code that called MovePath::getCoordSet() and that threw a null pointer exception when trying to sync on that object.

image

HoneySkull commented 5 months ago

The other interesting thing to note here is that the path that threw this exception has the same set of move sequences. Going forward turning around and going forward at the very edge of the map.

Thom293 commented 5 months ago

what PR is this code from?

HoneySkull commented 5 months ago

image

The root cause of the NPE seems to be Princess generating a move path that has steps positions that take it off the board. This Thor is trying to go Forward, Turn Left, Turn Left, Turn Left, Forward, Turn Left, Turn Left.

HoneySkull commented 5 months ago

what PR is this code from?

2acf25c21d4715cc28fc9921a6644584dd2f19be

Sleet01 commented 5 months ago

It's possible my fix for the packet-handling case fall through broke this, but I highly doubt it - my guess would be whichever merge caused that issue also contributed to this one.

HoneySkull commented 5 months ago

From my testing - the breaking commit is here: 02ace957

image

The method I found best to test this was to create a game bot vs. bot with each bot being default settings with the parameters listed below:

Bot 1 -> deploy North Bot 2 -> deploy South

Load bot 1 with this MUL file: Young Wolves.mul Load bot 2 with this MUL file: Twelfth Falcon Regulars.mul

Load the game parameters from the game options menu: npe_game_options.xml

For the map: select 35x35 Lilith

NPE_MUL_OPTIONS.zip

On a good commit, the bots will duke it out and the game will finish in around 22 rounds. On a bad commit, a NPE will occur.