Fix `StackOverflowError` when a wire is connected to itself

aphedges commented 1 year ago

During each tick, electrical components are checked for whether they connect to ground with the help of the method Terminal.connectsToGround(). The method contains a maxSteps parameter that is decremented on each recursive call to prevent infinite recursion. However, the value of maxSteps was never checked, so the method would recurse infinitely until the stack overflowed when one terminal of a wire was connected to the other terminal. Adding checks to stop recursing when maxSteps reaches 0 prevents the error.

I used the following script for testing:

from scienceworld import ScienceWorldEnv

env = ScienceWorldEnv("")
env.load("task-2a-test-conductivity-of-unknown-substances", 380, "")
*_, info = env.step("connect orange wire terminal 2 to orange wire terminal 1")
print(repr(info["look"]))

Before this commit, I get the following output:

'java.lang.StackOverflowError'

After this commit, I get the following output:

'This room is called the workshop. In it, you see: \n\tthe agent\n\ta substance called air\n\ta green box (containing nothing)\n\ta red box (containing nothing)\n\ta table. On the table is: a battery, a blue wire, a green light bulb, which is off, a orange wire, a red wire, a switch, which is off, a violet light bulb, which is off, a yellow light bulb, which is off.\n\ta ultra low temperature freezer. The ultra low temperature freezer door is closed. \n\tunknown substance R\nYou also see:\n\tA door to the hallway (that is closed)\n'

To debug the issue, I created a new patch to replace the patch in https://github.com/allenai/ScienceWorld/pull/30#issuecomment-1307891629 that worked before #30 was merged. This specific patch uses 6ba473019851c92df9be46b46d632c2db793fd66 as a base.

diff --git a/scienceworld/scienceworld.py b/scienceworld/scienceworld.py
index 227c6b5..6911444 100644
--- a/scienceworld/scienceworld.py
+++ b/scienceworld/scienceworld.py
@@ -30,7 +30,13 @@ class ScienceWorldEnv:

         # Launch Java side with dynamic port and get back the port on which the
         # server was bound to.
-        port = launch_gateway(classpath=serverPath, die_on_exit=True, cwd=BASEPATH)
+        import sys, time
+        port = launch_gateway(classpath=serverPath, die_on_exit=True, cwd=BASEPATH,
+                  javaopts=['-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5005,quiet=y'],
+                  redirect_stdout=sys.stdout, redirect_stderr=sys.stderr)
+        print("Attach debugger")
+        time.sleep(10)  # Give time for user to attach debugger
+

         # Connect python side to Java side with Java dynamic port and start python
         # callback server with a dynamic port

MarcCote commented 1 year ago

I think this fix will probably solve https://github.com/cognitiveailab/drrn-scienceworld/issues/4 as well.

@aphedges very interested in making the debugging process builtin. Would you be willing to add those options and timeout if a given ENV variable is set? For instance, something like

if bool(os.environ.get("SCIENCEWORLD_DEBUG")):
    port = launch_gateway(classpath=serverPath, die_on_exit=True, cwd=BASEPATH,
                          javaopts=['-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5005,quiet=y'],
                          redirect_stdout=sys.stdout, redirect_stderr=sys.stderr)
    print("Attach debugger")
    time.sleep(10)  # Give time for user to attach debugger
else:
    port = launch_gateway(classpath=serverPath, die_on_exit=True, cwd=BASEPATH)

PeterAJansen commented 1 year ago

Thanks for fixing this @aphedges . I vaguely remember losing a few commits when I was developing the electrical engine, it's possible these checks were what ended up getting lost. As Marc said, I think you've probably also fixed https://github.com/cognitiveailab/drrn-scienceworld/issues/4 too. :)

Re: debugging, I agree, the process is very challenging and laborious. I typically made either small simulations to test something specific, or added println() calls in critical spots, then just ran the simulator manually in a terminal window and watched what would happen when the python client got to the critical point. (I think that last debug method might be slightly more challenging with the new auto-port-gateway, so redirecting the stdout as above should allow that debugging method to return again).

aphedges commented 1 year ago

@MarcCote:

@aphedges very interested in making the debugging process builtin. Would you be willing to add those options and timeout if a given ENV variable is set?

Sure! I actually wrote some very similar code in my agent, so I'll combine our two approaches and create a PR.

@PeterAJansen:

Re: debugging, I agree, the process is very challenging and laborious. I typically made either small simulations to test something specific, or added println() calls in critical spots, then just ran the simulator manually in a terminal window and watched what would happen when the python client got to the critical point.

I usually use those two techniques as well. I always create a small simulation program for a minimal reproduction and try to use println() statements, but if the bug is too difficult to solve with just those, then I attach a debugger.

allenai / ScienceWorld

Fix `StackOverflowError` when a wire is connected to itself #41