e2b-dev / E2B

Secure open source cloud runtime for AI apps & AI agents
https://e2b.dev/docs
Apache License 2.0
7.02k stars 458 forks source link

[E2B-435] Python SDK WebSocket/async bugs? #240

Closed mlejva closed 11 months ago

mlejva commented 11 months ago

I've been running the following script which contains for loop that starts a new sandbox, do a few operations inside the sandbox, and closes the sandbox in every iteration.

import os
from e2b import Sandbox
import time
import logging
import sys

# Set to `logging.INFO` to see less logs
logging.basicConfig(level=logging.DEBUG)
# logging.basicConfig(level=logging.INFO)

# Create logs dir
logs_dir = os.path.join(os.path.dirname(os.path.realpath(__file__)), "crash_logs_" + time.strftime("%Y%m%d-%H%M%S"))
if not os.path.exists(logs_dir):
  os.makedirs(logs_dir)

for i in range(0, 100):
  # For every new iteration, create a new log file named `log_{i}` inside the `logs_dir` and redirect `print()` to write into that file
  log_file = os.path.join(logs_dir, f"run_{i}")
  print(f"\n---------------------------------------------------------------------\n\n🔸 Running iteration {i}")
  f = open(log_file, "w")
  sys.stdout = f

  print(f"++++ ITERATION {i} ++++")
  sbx = Sandbox(
      on_stdout=print,
      on_stderr=print,
  )
  print("================================================================")
  print(f"SANDBOX ID = {sbx.id}")
  print("================================================================\n\n")

  sbx.process.start_and_wait("echo [debug test] run after potential crash")
  content = sbx.filesystem.read("/etc/hosts")
  print("[debug test] filesystem read", content)

  print("Waiting 5 seconds...")
  time.sleep(5)

  sbx.close()

Note that I've been originally testing it with a sandbox template that has start_cmd set and not the base template I'm using here. Not sure if it affects reproducibility. I cannot share the 100% same Python script as it's private (please ping me, I'll send it in DMs).

I've never managed to go through all 100 iterations because I keep stumbling on bugs that might be related to WebSocket and/or Python asyncio.

Here I share 3 (perhaps) different errors:

1. WebSocket received error while receiving messages: no close frame received or sent

Sometimes the script just hangs and prints

WebSocket received error while receiving messages: no close frame received or sent 

There's no stack trace or anything else until I interrupt the execution.

WebSocket received error while receiving messages: no close frame received or sent
^CTraceback (most recent call last):
  File "/Users/mlejva/Developer/multion-test/multion_test/remote_with_cmd.py", line 59, in <module>
    # For every new iteration, create a new log file named `log_{i}` inside the `logs_dir` and redirect `print()` to write into that file
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mlejva/Developer/multion-test/.venv/lib/python3.11/site-packages/e2b/sandbox/process.py", line 415, in start_and_wait
    ).wait()
      ^^^^^^
  File "/Users/mlejva/Developer/multion-test/.venv/lib/python3.11/site-packages/e2b/sandbox/process.py", line 172, in wait
    return self._finished.result()
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mlejva/Developer/multion-test/.venv/lib/python3.11/site-packages/e2b/utils/future.py", line 22, in result
    return self._future.result(timeout=timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mlejva/.pyenv/versions/3.11.4/lib/python3.11/concurrent/futures/_base.py", line 451, in result
    self._condition.wait(timeout)
  File "/Users/mlejva/.pyenv/versions/3.11.4/lib/python3.11/threading.py", line 320, in wait
    waiter.acquire()
KeyboardInterrupt
^CException ignored in: <module 'threading' from '/Users/mlejva/.pyenv/versions/3.11.4/lib/python3.11/threading.py'>
Traceback (most recent call last):
  File "/Users/mlejva/.pyenv/versions/3.11.4/lib/python3.11/threading.py", line 1553, in _shutdown
    atexit_call()
  File "/Users/mlejva/.pyenv/versions/3.11.4/lib/python3.11/concurrent/futures/thread.py", line 31, in _python_exit
    t.join()
  File "/Users/mlejva/.pyenv/versions/3.11.4/lib/python3.11/threading.py", line 1112, in join
    self._wait_for_tstate_lock()
  File "/Users/mlejva/.pyenv/versions/3.11.4/lib/python3.11/threading.py", line 1132, in _wait_for_tstate_lock
    if lock.acquire(block, timeout):
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyboardInterrupt: 

2. Error in sandbox_rpc.py

image

3. Error in process.py in the start_and_wait() method

DEBUG:websockets.client:< TEXT '{"jsonrpc":"2.0","method":"process_subscription...700932076337185831}}}\n' [214 bytes]
DEBUG:e2b.sandbox.websocket_client:WebSocket received message: {"jsonrpc":"2.0","method":"process_subscription","params":{"subscription":"0x29779dcdad6290f742b75a951016d4b","result":{"type":"Stdout","line":"\u003e multion-remote@0.0.1 start","timestamp":1700932076337185831}}}
DEBUG:e2b.sandbox.sandbox_rpc:WebSocket received message: {"jsonrpc":"2.0","method":"process_subscription","params":{"subscription":"0x29779dcdad6290f742b75a951016d4b","result":{"type":"Stdout","line":"\u003e multion-remote@0.0.1 start","timestamp":1700932076337185831}}}
DEBUG:e2b.sandbox.sandbox_rpc:Processing message: {"jsonrpc":"2.0","method":"process_subscription","params":{"subscription":"0x29779dcdad6290f742b75a951016d4b","result":{"type":"Stdout","line":"\u003e multion-remote@0.0.1 start","timestamp":1700932076337185831}}}
DEBUG:e2b.sandbox.sandbox_rpc:Current waiting handlers: []
DEBUG:e2b.sandbox.sandbox_connection:Notification method='process_subscription' params={'subscription': '0x29779dcdad6290f742b75a951016d4b', 'result': {'type': 'Stdout', 'line': '> multion-remote@0.0.1 start', 'timestamp': 1700932076337185831}}
> multion-remote@0.0.1 start
DEBUG:websockets.client:< TEXT '{"jsonrpc":"2.0","method":"process_subscription...700932076342490572}}}\n' [206 bytes]
DEBUG:e2b.sandbox.websocket_client:WebSocket received message: {"jsonrpc":"2.0","method":"process_subscription","params":{"subscription":"0x29779dcdad6290f742b75a951016d4b","result":{"type":"Stdout","line":"\u003e node src/server.js","timestamp":1700932076342490572}}}
DEBUG:e2b.sandbox.sandbox_rpc:WebSocket received message: {"jsonrpc":"2.0","method":"process_subscription","params":{"subscription":"0x29779dcdad6290f742b75a951016d4b","result":{"type":"Stdout","line":"\u003e node src/server.js","timestamp":1700932076342490572}}}
DEBUG:e2b.sandbox.sandbox_rpc:Processing message: {"jsonrpc":"2.0","method":"process_subscription","params":{"subscription":"0x29779dcdad6290f742b75a951016d4b","result":{"type":"Stdout","line":"\u003e node src/server.js","timestamp":1700932076342490572}}}
DEBUG:e2b.sandbox.sandbox_rpc:Current waiting handlers: []
DEBUG:e2b.sandbox.sandbox_connection:Notification method='process_subscription' params={'subscription': '0x29779dcdad6290f742b75a951016d4b', 'result': {'type': 'Stdout', 'line': '> node src/server.js', 'timestamp': 1700932076342490572}}
> node src/server.js
DEBUG:websockets.client:< TEXT '{"jsonrpc":"2.0","method":"process_subscription...700932076344906377}}}\n' [270 bytes]
DEBUG:e2b.sandbox.websocket_client:WebSocket received message: {"jsonrpc":"2.0","method":"process_subscription","params":{"subscription":"0x29779dcdad6290f742b75a951016d4b","result":{"type":"Stdout","line":"Error loading browser polyfill: This script should only be loaded in a browser extension.","timestamp":1700932076344906377}}}
DEBUG:e2b.sandbox.sandbox_rpc:WebSocket received message: {"jsonrpc":"2.0","method":"process_subscription","params":{"subscription":"0x29779dcdad6290f742b75a951016d4b","result":{"type":"Stdout","line":"Error loading browser polyfill: This script should only be loaded in a browser extension.","timestamp":1700932076344906377}}}
DEBUG:e2b.sandbox.sandbox_rpc:Processing message: {"jsonrpc":"2.0","method":"process_subscription","params":{"subscription":"0x29779dcdad6290f742b75a951016d4b","result":{"type":"Stdout","line":"Error loading browser polyfill: This script should only be loaded in a browser extension.","timestamp":1700932076344906377}}}
DEBUG:e2b.sandbox.sandbox_rpc:Current waiting handlers: []
DEBUG:e2b.sandbox.sandbox_connection:Notification method='process_subscription' params={'subscription': '0x29779dcdad6290f742b75a951016d4b', 'result': {'type': 'Stdout', 'line': 'Error loading browser polyfill: This script should only be loaded in a browser extension.', 'timestamp': 1700932076344906377}}
Error loading browser polyfill: This script should only be loaded in a browser extension.
DEBUG:websockets.client:! failing connection with code 1006
DEBUG:websockets.client:= connection is CLOSED
ERROR:e2b.sandbox.websocket_client:WebSocket received error while receiving messages: no close frame received or sent
DEBUG:e2b.sandbox.sandbox_connection:Refreshed sandbox iwt6buod9j5q04hxuawch
DEBUG:e2b.sandbox.sandbox_connection:Refreshed sandbox iwt6buod9j5q04hxuawch
DEBUG:e2b.sandbox.sandbox_connection:Refreshed sandbox iwt6buod9j5q04hxuawch
DEBUG:e2b.sandbox.sandbox_connection:Refreshed sandbox iwt6buod9j5q04hxuawch
DEBUG:e2b.sandbox.sandbox_connection:Refreshed sandbox iwt6buod9j5q04hxuawch
200
{"result":{"url":"https://www.google.com/maps/reserve/v/dine/c/jMYMLG1gaPs?source=pa&opi=79508299&hl=en-US&gei=9CliZfWPAYLz0PEP66qDsAI&sourceurl=http://www.google.com/service/MapsSearchService/Search?authuser%3D0%26gl%3Dus%26hl%3Den%26q%3DRoma%2BMenlo%2BPark%26tbm%3Dmap","screenshot":"","message":"COMMANDS:\nCLICK 25\n\nEXPLANATION: I am clicking on the \"Reserve a table\" link for the restaurant \"Roma\" at Menlo Park on Google Maps.\n\nSTATUS: DONE","status":"DONE","spots":{"status":"Not in google reserve page"},"bookingUrl":""}}
DEBUG:e2b.sandbox.sandbox_connection:Refreshed sandbox iwt6buod9j5q04hxuawch
INFO:e2b.sandbox.process:Starting process: echo run after potential crash
DEBUG:e2b.sandbox.sandbox_rpc:WebSocket queueing message: {"jsonrpc": "2.0", "method": "process_subscribe", "params": ["onExit", "2o1JSOe54yv6"], "id": 5}
DEBUG:e2b.sandbox.sandbox_rpc:WebSocket waiting for reply: {"jsonrpc": "2.0", "method": "process_subscribe", "params": ["onExit", "2o1JSOe54yv6"], "id": 5}
DEBUG:e2b.sandbox.websocket_client:WebSocket message to send: {"jsonrpc": "2.0", "method": "process_subscribe", "params": ["onExit", "2o1JSOe54yv6"], "id": 5}
DEBUG:e2b.sandbox.sandbox_connection:Refreshed sandbox iwt6buod9j5q04hxuawch
DEBUG:e2b.sandbox.sandbox_connection:Refreshed sandbox iwt6buod9j5q04hxuawch
DEBUG:e2b.sandbox.sandbox_connection:Refreshed sandbox iwt6buod9j5q04hxuawch
DEBUG:e2b.sandbox.sandbox_connection:Refreshed sandbox iwt6buod9j5q04hxuawch
INFO:e2b.sandbox.sandbox_connection:Stopped refreshing sandbox (id: iwt6buod9j5q04hxuawch)
INFO:e2b.sandbox.sandbox_connection:Closing sandbox dgsjmz6k2iw0l8w8hkwo (id: iwt6buod9j5q04hxuawch)
INFO:e2b.sandbox.websocket_client:WebSocket stopped
DEBUG:e2b.sandbox.sandbox_rpc:WebSocket received sign to stop.
ERROR:e2b.sandbox.sandbox_rpc:WebSocket received error while waiting for: {"jsonrpc": "2.0", "method": "process_subscribe", "params": ["onExit", "2o1JSOe54yv6"], "id": 5} 
DEBUG:e2b.sandbox.sandbox_rpc:WebSocket removed waiting handler for 5
Traceback (most recent call last):
  File "/Users/mlejva/Developer/multion-test/multion_test/remote.py", line 72, in <module>
ERROR:asyncio:Task was destroyed but it is pending!
task: <Task pending name='Task-17' coro=<<async_generator_athrow without __name__>()>>
    sbx.process.start_and_wait("echo run after potential crash")
  File "/Users/mlejva/Developer/multion-test/.venv/lib/python3.11/site-packages/e2b/sandbox/process.py", line 406, in start_and_wait
    return self.start(
           ^^^^^^^^^^^
  File "/Users/mlejva/Developer/multion-test/.venv/lib/python3.11/site-packages/e2b/sandbox/process.py", line 298, in start
    self._sandbox._subscribe(
  File "/Users/mlejva/Developer/multion-test/.venv/lib/python3.11/site-packages/e2b/sandbox/sandbox_connection.py", line 337, in _subscribe
    sub_id = self._call(service, "subscribe", [method, *params], timeout=timeout)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mlejva/Developer/multion-test/.venv/lib/python3.11/site-packages/e2b/sandbox/sandbox_connection.py", line 287, in _call
    return self._rpc.send_message(f"{service}_{method}", params, timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mlejva/Developer/multion-test/.venv/lib/python3.11/site-packages/e2b/sandbox/sandbox_rpc.py", line 147, in send_message
    raise e
  File "/Users/mlejva/Developer/multion-test/.venv/lib/python3.11/site-packages/e2b/sandbox/sandbox_rpc.py", line 138, in send_message
    r = future_reply.result(timeout=timeout)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mlejva/Developer/multion-test/.venv/lib/python3.11/site-packages/e2b/utils/future.py", line 22, in result
    return self._future.result(timeout=timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mlejva/.pyenv/versions/3.11.4/lib/python3.11/concurrent/futures/_base.py", line 454, in result
    raise CancelledError()
concurrent.futures._base.CancelledError

From SyncLinear.com | E2B-435

ValentaTomas commented 11 months ago

The bugs were a combination of a problem with our envd daemon running inside sandboxes and a handled WS error that was still being logged. Fixed, closing.