StephenGss / PAL

Polycraft World AI Lab
3 stars 6 forks source link

[POGO] (rare?) crash apparently during SENSE_ACTOR_ACTIONS #45

Open musliner opened 2 years ago

musliner commented 2 years ago

musliner@openmind0:~/openmind/code/test/results/pal-tournament-v2-nonov-2021-12-05-222307-openmind0$

Just after we PLACE_TREE_TAP successfully (included below), we sent SENSE_ACTOR_ACTIONS and, possibly as a result or before getting/handling that, PAL crashed with all this:

2021-12-06_04:40:48: PAL: [2021-12-05 22:40:47] [04:40:47] [BOT API THREAD/INFO]: [AGENT]PLACE_TREE_TAP^M ^M 2021-12-06_04:40:48: PAL: [2021-12-05 22:40:47] [04:40:47] [Client thread/INFO]: [edu.utd.minecraft.mod.polycraft.aitools.BotAPI:onClientTick:643]: PLACE_TREE_TAP^M ^M 2021-12-06_04:40:48: PAL: [2021-12-05 22:40:47] [04:40:47] [Client thread/INFO]: Running Command on Client side: edu.utd.minecraft.mod.polycraft.aitools.commands.APICommandPlaceBlock^M ^M 2021-12-06_04:40:48: PAL: [2021-12-05 22:40:47] [04:40:47] [BOT API THREAD/INFO]: [CLIENT]{"goal":{"goalType":"ITEM","goalAchieved":false,"Distribution":"Uninformed"},"command_result":{"command":"PLACE_TREE_TAP","argument":"","result":"SUCCESS","message":"Block polycraft:tree_tap placed","stepCost":300.0},"step":59,"gameOver":false}^M ^M 2021-12-06_04:40:48: PAL: [2021-12-05 22:40:47] [04:40:47] [BOT API THREAD/INFO]: [SCORE]step:60,totalCost:28182.297,totalReward:64000.0,adjustedReward:35817.703^M ^M 2021-12-06_04:40:48: PAL: [2021-12-05 22:40:47] [04:40:47] [Server thread/ERROR]: Exception caught during firing event net.minecraftforge.fml.common.gameevent.TickEvent$ServerTickEvent@71f7b82f:^M ^M 2021-12-06_04:40:48: PAL: [2021-12-05 22:40:47] java.lang.NullPointerException^M ^M 2021-12-06_04:40:48: PAL: [2021-12-05 22:40:47] at edu.utd.minecraft.mod.polycraft.aitools.observations.ObservationPlayerInventory.getObservation(ObservationPlayerInventory.java:56) ~[ObservationPlayerInventory.class:?]^M ^M 2021-12-06_04:40:48: PAL: [2021-12-05 22:40:47] at edu.utd.minecraft.mod.polycraft.experiment.tutorial.ExperimentTutorial.getObservations(ExperimentTutorial.java:461) ~[ExperimentTutorial.class:?]^M ^M 2021-12-06_04:40:48: PAL: [2021-12-05 22:40:47] at edu.utd.minecraft.mod.polycraft.experiment.tutorial.TutorialFeatureEntity.updateLastSenseAll(TutorialFeatureEntity.java:113) ~[TutorialFeatureEntity.class:?]^M 2021-12-06_04:40:48: PAL: [2021-12-05 22:40:47] at edu.utd.minecraft.mod.polycraft.experiment.tutorial.TutorialFeatureEntity.moveTowardsTarget(TutorialFeatureEntity.java:323)^M ^M 2021-12-06_04:40:48: PAL: [2021-12-05 22:40:47] at edu.utd.minecraft.mod.polycraft.experiment.tutorial.TutorialFeatureEntityPogoist.onServerTickUpdate(TutorialFeatureEntityPogoist.java:323)^M ^M 2021-12-06_04:40:48: PAL: [2021-12-05 22:40:47] at edu.utd.minecraft.mod.polycraft.experiment.tutorial.ExperimentTutorial.onServerTickUpdate(ExperimentTutorial.java:311)^M ^M 2021-12-06_04:40:48: PAL: [2021-12-05 22:40:47] at edu.utd.minecraft.mod.polycraft.experiment.tutorial.TutorialManager.onServerTickUpdate(TutorialManager.java:104)^M ^M 2021-12-06_04:40:48: PAL: [2021-12-05 22:40:47] at edu.utd.minecraft.mod.polycraft.proxy.CommonProxy.onServerTick(CommonProxy.java:449)^M ^M 2021-12-06_04:40:48: PAL: [2021-12-05 22:40:47] at net.minecraftforge.fml.common.eventhandler.ASMEventHandler_23_CommonProxy_onServerTick_ServerTickEvent.invoke(.dynamic)^M ^M 2021-12-06_04:40:48: PAL: [2021-12-05 22:40:47] at net.minecraftforge.fml.common.eventhandler.ASMEventHandler.invoke(ASMEventHandler.java:49)^M ^M 2021-12-06_04:40:48: PAL: [2021-12-05 22:40:47] at net.minecraftforge.fml.common.eventhandler.EventBus.post(EventBus.java:140)^M ^M 2021-12-06_04:40:48: PAL: [2021-12-05 22:40:47] at net.minecraftforge.fml.common.FMLCommonHandler.onPostServerTick(FMLCommonHandler.java:253)^M 2021-12-06_04:40:48: PAL: [2021-12-05 22:40:47] at net.minecraft.server.MinecraftServer.tick(MinecraftServer.java:649)^M ^M 2021-12-06_04:40:48: PAL: [2021-12-05 22:40:47] at net.minecraft.server.integrated.IntegratedServer.tick(IntegratedServer.java:152)^M ^M 2021-12-06_04:40:48: PAL: [2021-12-05 22:40:47] at net.minecraft.server.MinecraftServer.run(MinecraftServer.java:481)^M ^M 2021-12-06_04:40:48: PAL: [2021-12-05 22:40:47] at java.lang.Thread.run(Thread.java:748)^M ^M 2021-12-06_04:40:48: PAL: [2021-12-05 22:40:47] ^M ^M 2021-12-06_04:40:48: PAL: [2021-12-05 22:40:47] ^M ^M 2021-12-06_04:40:48: PAL: [2021-12-05 22:40:47] A detailed walkthrough of the error, its code path and all known details is as follows:^M ^M 2021-12-06_04:40:48: PAL: [2021-12-05 22:40:47] ---------------------------------------------------------------------------------------^M ^M 2021-12-06_04:40:48: PAL: [2021-12-05 22:40:47] ^M ^M 2021-12-06_04:40:48: PAL: [2021-12-05 22:40:47] -- System Details --^M ^M 2021-12-06_04:40:48: PAL: [2021-12-05 22:40:47] Details:^M ^M 2021-12-06_04:40:48: PAL: [2021-12-05 22:40:47] Minecraft Version: 1.8.9^M ^M 2021-12-06_04:40:48: PAL: [2021-12-05 22:40:47] Operating System: Linux (amd64) version 4.15.0-163-generic^M ^M 2021-12-06_04:40:48: PAL: [2021-12-05 22:40:47] Java Version: 1.8.0_162, Oracle Corporation^M ^M 2021-12-06_04:40:48: PAL: [2021-12-05 22:40:47] Java VM Version: OpenJDK 64-Bit Server VM (mixed mode), Oracle Corporation^M ^M 2021-12-06_04:40:48: PAL: [2021-12-05 22:40:47] Memory: 1478655984 bytes (1410 MB) / 2271215616 bytes (2166 MB) up to 28631367680 bytes (27305 MB)^M ^M 2021-12-06_04:40:48: PAL: [2021-12-05 22:40:47] JVM Flags: 0 total; ^M ^M 2021-12-06_04:40:48: PAL: [2021-12-05 22:40:47] IntCache: cache: 0, tcache: 0, allocated: 0, tallocated: 0^M ^M 2021-12-06_04:40:48: PAL: [2021-12-05 22:40:47] FML: MCP 9.19 Powered by Forge 11.15.1.2318 4 mods loaded, 4 mods active^M ^M 2021-12-06_04:40:48: PAL: [2021-12-05 22:40:47] States: 'U' = Unloaded 'L' = Loaded 'C' = Constructed 'H' = Pre-initialized 'I' = Initialized 'J' = Post-initialized 'A' = Available 'D' = Disabled 'E' = Errored^M ^M 2021-12-06_04:40:48: PAL: [2021-12-05 22:40:47] UCHIJAAAA mcp{9.19} [Minecraft Coder Pack] (minecraft.jar) ^M ^M Traceback (most recent call last):^M File "/openmind/code/test/LaunchTournament.py", line 847, in ^M pal.execute()^M File "/openmind/code/test/LaunchTournament.py", line 423, in execute^M if self._check_ended(str(next_line)):^M File "/openmind/code/test/LaunchTournament.py", line 206, in _check_ended^M data_dict = json.loads(json_text)^M File "/usr/lib/python3.6/json/init.py", line 354, in loads^M return _default_decoder.decode(s)^M File "/usr/lib/python3.6/json/decoder.py", line 339, in decode^M obj, end = self.raw_decode(s, idx=_w(s, 0).end())^M File "/usr/lib/python3.6/json/decoder.py", line 355, in raw_decode^M obj, end = self.scan_once(s, idx)^M json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)^M

musliner commented 2 years ago

Note that this was with me testing 1s delay between handling the PLACE success and requesting SENSE_ACTOR_ACTIONS. Normally we do 2s delay, so that might be why we don't see this normally.

musliner commented 2 years ago

Timestamps from executive log suggest the crash (at 22:40:47) happened before we sent the SENSE_ACTOR_ACTIONS:

[2021-12-05 22:40:47] RESULT: sent domain command PLACE_TREE_TAP^M [2021-12-05 22:40:47] openmind: getting the message: (command-response success "Block polycraft:tree_tap placed" 300.000000 "False" "False") from bridge.^M [2021-12-05 22:40:47] Telling planner ((:type :command-response)(:plan-name t0-g20-p1)(:step-index 22)(:command "PLACE_TREE_TAP")(:result success)(:message "Block polycraft:tree_tap placed")(:step-cost 300.000000)(:gameover "False")(:goalachieved "False"))^M [2021-12-05 22:40:47] RESULT: domain command result success Block polycraft:tree_tap placed 300.000000 False False^M [2021-12-05 22:40:47] incrementing stepcost for plan t0-g20-p1 index 22 with cur-cost 558.188480 by cost 300.000000^M [2021-12-05 22:40:48] RESULT: sent domain command SENSE_ACTOR_ACTIONS^M

as expected, the bridge log agrees, the command from exec was sent at :48

[2021-12-05 22:40:47] prsinterface[944463] send_message: Sending message to openmind: (command-response SUCCESS "Block polycraft:tree_tap placed" 300.0 "False" "False")^M [2021-12-05 22:40:47] prsinterface[944463] main: Waiting for msg from OPRS^M [2021-12-05 22:40:48] Received message from b'openmind': b'SENSE_ACTOR_ACTIONS'; sending to Polycraft^M [2021-12-05 22:40:48] Stripped parens, uppercased first two words, and sending to Polycraft: SENSE_ACTOR_ACTIONS^M [2021-12-05 22:40:48] prsinterface[945166] main: Trying to get a msg from pcw^M

the outer pal-tournament log (in original ticket body) with wrong-timezone timestamps had been 1s late for a while... (eg here :10 instead of inner :09) 2021-12-06_04:25:10: PAL: [2021-12-05 22:25:09] [04:25:09] [BOT API THREAD/INFO]: [edu.utd.minecraft.mod.polycraft.aitools.BotAPI$1:run:728]: SENSE_ALL NONAV^M

StephenGss commented 2 years ago

Do you have more information on this? Like what items the player was holding? It looks like an item in the player's inventory is causing this error, but I can't reproduce with any of the items.

The game is crashing when the actors generate the pre-world state player inventory. I don't think it ever got your sense_actor_actions command.

mpatsift commented 2 years ago

The last inventory list printed in the pal-tournament.log (before placing the tree tap):

"inventory": {"0":{"item":"minecraft:iron_pickaxe","count":1,"damage":3,"maxdamage":250}, "1":{"item":"polycraft:block_of_platinum","count":1,"damage":0,"maxdamage":0}, "2":{"item":"minecraft:diamond","count":9,"damage":0,"maxdamage":0}, "3":{"item":"minecraft:planks","count":1,"damage":0,"maxdamage":0,"variant":"oak"}, "4":{"item":"minecraft:stick","count":3,"damage":0,"maxdamage":0}, "5":{"item":"polycraft:tree_tap","count":1,"damage":0,"maxdamage":0,"enabled":"true","facing":"down"}, "selectedItem":{"slot":0,"item":"minecraft:iron_pickaxe","count":1,"damage":3,"maxdamage":250}}

Here's the full pal-tournament log pal-tournament-musliner-2021-12-05-222307-openmind0.log.gz :

musliner commented 2 years ago

just saw this PAL crash again in musliner@openmind0:~/openmind/code/test/results/pal-tournament-v2-shared-1-2-1-M-2022-01-07-053806-openmind0$ in case we need more logs

mpatsift commented 2 years ago

Our regression tests hit this (I think) again last night.

I'll post here, if you want it.

prt-suite-2022-02-01-215523-openmind0/pal-tournament-v2-nonov-3-games-2022-02-01-223243-openmind0/

musliner commented 2 years ago

I'm trying to run 100 no-nov games and this keeps happening...looks like it is not rare. Any ideas?

StephenGss commented 2 years ago

Are you seeing the same error from before? I did an update on the svn, but my code/test/results folder is empty. Are the logs somewhere else?

musliner commented 2 years ago

we have way too many results to put in svn, so logs only avail on request. Yes the error appears same as in this issue...

pal-tournament.log sez:

2022-03-09_20:19:28: PAL: [2022-03-09 14:19:27] [20:19:27] [BOT API THREAD/INFO]: [AGENT]PLACE_TREE_TAP^M ^M 2022-03-09_20:19:28: PAL: [2022-03-09 14:19:27] [20:19:27] [Client thread/INFO]: [edu.utd.minecraft.mod.polycraft.aitools.BotAPI:onClientTick:643]: PLACE_TREE_TAP^M ^M 2022-03-09_20:19:28: PAL: [2022-03-09 14:19:27] [20:19:27] [Client thread/INFO]: Running Command on Client side: edu.utd.minecraft.mod.polycraft.aitools.commands.APICommandPlaceBlock^M ^M 2022-03-09_20:19:28: PAL: [2022-03-09 14:19:27] [20:19:27] [BOT API THREAD/INFO]: [CLIENT]{"goal":{"goalType":"ITEM","goalAchieved":false,"Distribution":"Uninformed"},"command_result":{"command":"PLACE_TREE_TAP","argument":"","result":"SUCCESS","message":"Block polycraft:tree_tap placed","stepCost":300.0},"step":70,"gameOver":false}^M ^M 2022-03-09_20:19:28: PAL: [2022-03-09 14:19:27] [20:19:27] [BOT API THREAD/INFO]: [SCORE]step:71,totalCost:29452.48,totalReward:64000.0,adjustedReward:34547.52^M ^M 2022-03-09_20:19:28: PAL: [2022-03-09 14:19:27] [20:19:27] [Client thread/INFO]: [edu.utd.minecraft.mod.polycraft.experiment.tutorial.ExperimentTutorial:onClientTickUpdate:431]: [Client to Server] Sending Feature update^M ^M 2022-03-09_20:19:28: PAL: [2022-03-09 14:19:27] [20:19:27] [Server thread/ERROR]: Exception caught during firing event net.minecraftforge.fml.common.gameevent.TickEvent$ServerTickEvent@2f51952d:^M ^M 2022-03-09_20:19:28: PAL: [2022-03-09 14:19:27] java.lang.NullPointerException^M ^M 2022-03-09_20:19:28: PAL: [2022-03-09 14:19:27] at edu.utd.minecraft.mod.polycraft.aitools.observations.ObservationPlayerInventory.getObservation(ObservationPlayerInventory.java:77) ~[ObservationPlayerInventory.class:?]^M ^M 2022-03-09_20:19:28: PAL: [2022-03-09 14:19:27] at edu.utd.minecraft.mod.polycraft.experiment.tutorial.ExperimentTutorial.getObservations(ExperimentTutorial.java:461) ~[ExperimentTutorial.class:?]^M ... (crash)

musliner commented 2 years ago

if you want the full thing just let me know, i'll get it attached here, thx

StephenGss commented 2 years ago

I'm still unable to replicate it, but I put that code in a try-catch and pushed an update. It shouldn't affect anything if that piece of code fails in this case. Please update to the latest on release 2.0 and run your test again.

mpatsift commented 2 years ago

Looks like same issue yesterday (I'm up-to-date wrt release 2.0): pal-tournament.log

musliner commented 2 years ago

I think I mentioned in email, but not here, that the try/catch from March10 is not preventing crashing of the tournament....

mpatsift commented 2 years ago

This still happens once every few-to-several days in our regression tests. No pattern afaict.

Should we assume that this bug is too rare to be worth fixing and the eval code UTD uses will simply re-run tournaments that crash in this way?