Closed kevin-support-bot[bot] closed 2 weeks ago
13:19:04 - openhands:ERROR: state.py:118 - Failed to restore state from session: sessions/30d3e43a-d0dc-4d61-9faf-f9b838ade8b8/agent_state.pkl
13:19:04 - openhands:INFO: agent_session.py:259 - State could not be restored: sessions/30d3e43a-d0dc-4d61-9faf-f9b838ade8b8/agent_state.pkl
13:19:04 - openhands:INFO: agent_session.py:260 - Agent controller initialized.
13:19:04 - openhands:INFO: agent_controller.py:152 - [Agent Controller 30d3e43a-d0dc-4d61-9faf-f9b838ade8b8] Starting step loop...
13:19:04 - OBSERVATION
AgentStateChangedObservation(content='', agent_state=<AgentState.INIT: 'init'>, observation='agent_state_changed')
13:19:04 - OBSERVATION
NullObservation(content='', observation='null')
13:19:04 - openhands:INFO: runtime.py:316 -
-----------------------------------Container logs:-----------------------------------
|INFO: 172.17.0.1:53646 - "POST /list_files HTTP/1.1" 200 OK
--------------------------------------------------------------------------------
13:19:04 - openhands:INFO: runtime.py:316 -
-----------------------------------Container logs:-----------------------------------
|INFO: 172.17.0.1:53646 - "POST /execute_action HTTP/1.1" 200 OK
--------------------------------------------------------------------------------
INFO: 192.168.1.100:6768 - "GET /api/list-files HTTP/1.1" 200 OK
13:19:53 - USER_ACTION
**MessageAction** (source=EventSource.USER)
CONTENT: We just installed the environment, but did not do conda init, you help me initialize, and then source ~/.bashrc update. Then enter the pytorch_env environment in conda
13:19:53 - OBSERVATION
AgentStateChangedObservation(content='', agent_state=<AgentState.RUNNING: 'running'>, observation='agent_state_changed')
13:19:53 - OBSERVATION
NullObservation(content='', observation='null')
13:19:53 - openhands:INFO: runtime.py:316 -
-----------------------------------Container logs:-----------------------------------
|INFO: 172.17.0.1:54292 - "POST /list_files HTTP/1.1" 200 OK
--------------------------------------------------------------------------------
13:19:53 - openhands:INFO: runtime.py:316 -
-----------------------------------Container logs:-----------------------------------
|INFO: 172.17.0.1:54292 - "POST /execute_action HTTP/1.1" 200 OK
--------------------------------------------------------------------------------
INFO: 192.168.1.100:6815 - "GET /api/list-files HTTP/1.1" 200 OK
==============
CodeActAgent LEVEL 0 LOCAL STEP 0 GLOBAL STEP 0
13:19:57 - openhands:WARNING: llm.py:374 - Cost calculation not supported for this model.
13:19:57 - openhands:INFO: llm.py:308 - Input tokens: 4127 | Output tokens: 43
13:19:57 - ACTION
**CmdRunAction (source=EventSource.AGENT)**
THOUGHT: Let's initialize Conda and update the shell configuration, then activate the `pytorch_env` environment.
First, I'll initialize Conda:
COMMAND:
conda init bash
13:19:58 - openhands:INFO: runtime.py:316 -
-----------------------------------Container logs:-----------------------------------
|INFO: 172.17.0.1:54292 - "POST /execute_action HTTP/1.1" 200 OK
--------------------------------------------------------------------------------
13:19:58 - OBSERVATION
**CmdOutputObservation (source=EventSource.AGENT, exit code=0)**
no change /opt/conda/condabin/conda
no change /opt/conda/bin/conda
no change /opt/conda/bin/conda-env
no change /opt/conda/bin/activate
no change /opt/conda/bin/deactivate
no change /opt/conda/etc/profile.d/conda.sh
no change /opt/conda/etc/fish/conf.d/conda.fish
no change /opt/conda/shell/condabin/Conda.psm1
modified /opt/conda/shell/condabin/conda-hook.ps1
no change /opt/conda/lib/python3.12/site-packages/xontrib/conda.xsh
no change /opt/conda/etc/profile.d/conda.csh
modified /home/openhands/.bashrc
==> For changes to take effect, close and re-open your current shell. <==
[Python Interpreter: /openhands/poetry/openhands-ai-5O4_aCHf-py3.11/bin/python]
openhands@fd03868f5a7a:/workspace $
==============
CodeActAgent LEVEL 0 LOCAL STEP 1 GLOBAL STEP 1
13:20:03 - openhands:INFO: llm.py:308 - Input tokens: 4431 | Output tokens: 49
13:20:03 - ACTION
**CmdRunAction (source=EventSource.AGENT)**
THOUGHT: The Conda initialization has been completed, and the `.bashrc` file has been modified. To apply these changes, I'll source the `.bashrc` file:
COMMAND:
source ~/.bashrc
13:22:03 - openhands:ERROR: runtime.py:435 - No response received within the timeout period.
13:22:03 - openhands:INFO: runtime.py:316 -
-----------------------------------Container logs:-----------------------------------
|[W 2024-10-06 13:21:46.982 KernelGatewayApp] WebSocket ping timeout after 119999 ms.
|[I 2024-10-06 13:21:51.987 KernelGatewayApp] Starting buffering for b4a138d1-3974-45cf-b847-115ff3a75cea:d13c3646-5028a3b2ec321752462fc14c
--------------------------------------------------------------------------------
13:22:03 - OBSERVATION
ErrorObservation(content='Action execution timed out after 120 seconds.', observation='error')
==============
CodeActAgent LEVEL 0 LOCAL STEP 2 GLOBAL STEP 2
13:22:06 - openhands:INFO: llm.py:308 - Input tokens: 4510 | Output tokens: 36
13:22:06 - ACTION
**CmdRunAction (source=EventSource.AGENT)**
THOUGHT: It seems that sourcing the `.bashrc` file took too long. Let's try running it again:
COMMAND:
source ~/.bashrc
13:23:32 - openhands:INFO: manager.py:68 - Session 319fb6fc-afcb-4f1c-8748-dec483318c4e and related resource have been removed due to inactivity.
13:23:32 - openhands:INFO: manager.py:68 - Session 0020201a-c7d6-43cb-811d-1de4db19802d and related resource have been removed due to inactivity.
13:24:06 - openhands:ERROR: runtime.py:435 - No response received within the timeout period.
13:24:06 - OBSERVATION
ErrorObservation(content='Action execution timed out after 120 seconds.', observation='error')
==============
CodeActAgent LEVEL 0 LOCAL STEP 3 GLOBAL STEP 3
13:24:10 - openhands:INFO: llm.py:308 - Input tokens: 4576 | Output tokens: 47
13:24:10 - ACTION
**CmdRunAction (source=EventSource.AGENT)**
THOUGHT: It appears that sourcing the `.bashrc` file is timing out. Let's try a different approach by directly activating the `pytorch_env` environment:
COMMAND:
conda activate pytorch_env
13:26:10 - openhands:ERROR: runtime.py:435 - No response received within the timeout period.
13:26:10 - OBSERVATION
ErrorObservation(content='Action execution timed out after 120 seconds.', observation='error')
==============
CodeActAgent LEVEL 0 LOCAL STEP 4 GLOBAL STEP 4
13:26:15 - openhands:INFO: llm.py:308 - Input tokens: 4653 | Output tokens: 63
13:26:15 - ACTION
**CmdRunAction (source=EventSource.AGENT)**
THOUGHT: It seems that directly activating the `pytorch_env` environment is also timing out. Let's try a different approach by sourcing the `.bashrc` file and then immediately activating the environment in one command:
COMMAND:
source ~/.bashrc && conda activate pytorch_env
13:28:15 - openhands:ERROR: runtime.py:435 - No response received within the timeout period.
13:28:15 - OBSERVATION
ErrorObservation(content='Action execution timed out after 120 seconds.', observation='error')
CONTENT: We just installed the environment, but did not do conda init, you help me initialize, and then source ~/.bashrc update. Then enter the pytorch_env environment in conda
Conda is not needed as the code is running in an isolated environment. What's your use case?
in fact the agent always struggle to build a conda env, so i try to build a pytorch env from the docker, so that the agent can start quickly from running the code. instead of installing packages
Do you have any suggestions? here are my dockerfile
# Stage 1: Prepare GPU environment based on NVIDIA CUDA image
FROM nvidia/cuda:12.6.1-devel-ubi8 AS cuda
# Stage 2: Use specified image
FROM ghcr.io/all-hands-ai/runtime:0.9-nikolaik
# Set non-interactive frontend
ENV DEBIAN_FRONTEND=noninteractive
# Install necessary dependencies, including sudo, g++, build-essential, and other common tools
RUN apt-get update && \
apt-get install -y sudo g++ build-essential wget bzip2 ca-certificates \
libglib2.0-0 libxext6 libsm6 libxrender1 git && \
apt-get clean
# Copy CUDA toolchain and libraries from CUDA image
COPY --from=cuda /usr/local/cuda /usr/local/cuda
# Set CUDA environment variables
ENV PATH="/usr/local/cuda/bin:${PATH}"
# Explicitly initialize LD_LIBRARY_PATH to avoid undefined variable warnings
ENV LD_LIBRARY_PATH="/usr/local/cuda/lib64"
# Download and install specified version of Anaconda
RUN wget https://repo.anaconda.com/archive/Anaconda3-2024.06-1-Linux-x86_64.sh -O anaconda.sh && \
bash anaconda.sh -b -p /opt/conda && \
rm anaconda.sh
# Set conda environment variables
ENV PATH="/opt/conda/bin:${PATH}"
# Initialize conda
RUN conda init bash
# Create a new conda environment and install PyTorch
RUN conda create -n pytorch_env python=3.9 -y && \
. /opt/conda/etc/profile.d/conda.sh && \
conda activate pytorch_env && \
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia -y
# Set default conda environment
RUN echo "conda activate pytorch_env" >> ~/.bashrc
# Clean cache to reduce image size
RUN apt-get clean && rm -rf /var/lib/apt/lists/* && conda clean -a -y
# Set working directory
WORKDIR /workspace
# Set default command
CMD ["/bin/bash"]
# Initialize conda
RUN conda init
# Set default conda environment
RUN echo "conda activate pytorch_env" >> ~/.bashrc
13:28:15 - OBSERVATION ErrorObservation(content='Action execution timed out after 120 seconds.', observation='error')
You could increase the timeout in the config.toml
under [sandbox]
group.
@SmartManoj thank, i will try it
Also, could you manually run source ~/.bashrc
in the docker container to check how long it will take?
seems start openhands via docker can not change the timeout limit right? i'm not familiar with it..
and, i ve try manually go into the docker image. What surprised me is that it is actually already inited. I can not figure out why. Seems that this only happens when the openhands agent is operating the conda env.
it seems when the agent run sudo -s
inside the docker also run into this issue. (I change the docker image and installed sudo
cmd)
i suggest, maybe we can let the agent run as root
instead of user openhands
? I think this issue is related with the root
user in the docker
seems start openhands via docker can not change the timeout limit
You can pass -e SANDBOX_TIMEOUT=value
to the docker command.
i suggest, maybe we can let the agent run as
root
instead of useropenhands
-e RUN_AS_OPENHANDS=False
will do that.
This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.
This issue was closed because it has been stalled for over 30 days with no activity.
https://github.com/All-Hands-AI/OpenHands/issues/2178 Issue
@x66ccff Could you provide the logs?