SmartManoj / Kevin

⚡ Kevin: Code Quick, Create Fast
MIT License
7 stars 0 forks source link

`source ~/.bashrc` times out when running as root #65

Closed kevin-support-bot[bot] closed 2 weeks ago

kevin-support-bot[bot] commented 1 month ago

https://github.com/All-Hands-AI/OpenHands/issues/2178 Issue


@x66ccff Could you provide the logs?

x66ccff commented 1 month ago
13:19:04 - openhands:ERROR: state.py:118 - Failed to restore state from session: sessions/30d3e43a-d0dc-4d61-9faf-f9b838ade8b8/agent_state.pkl
13:19:04 - openhands:INFO: agent_session.py:259 - State could not be restored: sessions/30d3e43a-d0dc-4d61-9faf-f9b838ade8b8/agent_state.pkl
13:19:04 - openhands:INFO: agent_session.py:260 - Agent controller initialized.
13:19:04 - openhands:INFO: agent_controller.py:152 - [Agent Controller 30d3e43a-d0dc-4d61-9faf-f9b838ade8b8] Starting step loop...
13:19:04 - OBSERVATION
AgentStateChangedObservation(content='', agent_state=<AgentState.INIT: 'init'>, observation='agent_state_changed')
13:19:04 - OBSERVATION
NullObservation(content='', observation='null')
13:19:04 - openhands:INFO: runtime.py:316 -
-----------------------------------Container logs:-----------------------------------
    |INFO:     172.17.0.1:53646 - "POST /list_files HTTP/1.1" 200 OK
--------------------------------------------------------------------------------
13:19:04 - openhands:INFO: runtime.py:316 -
-----------------------------------Container logs:-----------------------------------
    |INFO:     172.17.0.1:53646 - "POST /execute_action HTTP/1.1" 200 OK
--------------------------------------------------------------------------------
INFO:     192.168.1.100:6768 - "GET /api/list-files HTTP/1.1" 200 OK
13:19:53 - USER_ACTION
**MessageAction** (source=EventSource.USER)
CONTENT: We just installed the environment, but did not do conda init, you help me initialize, and then source ~/.bashrc update. Then enter the pytorch_env environment in conda
13:19:53 - OBSERVATION
AgentStateChangedObservation(content='', agent_state=<AgentState.RUNNING: 'running'>, observation='agent_state_changed')
13:19:53 - OBSERVATION
NullObservation(content='', observation='null')
13:19:53 - openhands:INFO: runtime.py:316 -
-----------------------------------Container logs:-----------------------------------
    |INFO:     172.17.0.1:54292 - "POST /list_files HTTP/1.1" 200 OK
--------------------------------------------------------------------------------
13:19:53 - openhands:INFO: runtime.py:316 -
-----------------------------------Container logs:-----------------------------------
    |INFO:     172.17.0.1:54292 - "POST /execute_action HTTP/1.1" 200 OK
--------------------------------------------------------------------------------
INFO:     192.168.1.100:6815 - "GET /api/list-files HTTP/1.1" 200 OK

==============
CodeActAgent LEVEL 0 LOCAL STEP 0 GLOBAL STEP 0

13:19:57 - openhands:WARNING: llm.py:374 - Cost calculation not supported for this model.
13:19:57 - openhands:INFO: llm.py:308 - Input tokens: 4127 | Output tokens: 43

13:19:57 - ACTION
**CmdRunAction (source=EventSource.AGENT)**
THOUGHT: Let's initialize Conda and update the shell configuration, then activate the `pytorch_env` environment.

First, I'll initialize Conda:
COMMAND:
conda init bash
13:19:58 - openhands:INFO: runtime.py:316 -
-----------------------------------Container logs:-----------------------------------
    |INFO:     172.17.0.1:54292 - "POST /execute_action HTTP/1.1" 200 OK
--------------------------------------------------------------------------------
13:19:58 - OBSERVATION
**CmdOutputObservation (source=EventSource.AGENT, exit code=0)**
no change     /opt/conda/condabin/conda
no change     /opt/conda/bin/conda
no change     /opt/conda/bin/conda-env
no change     /opt/conda/bin/activate
no change     /opt/conda/bin/deactivate
no change     /opt/conda/etc/profile.d/conda.sh
no change     /opt/conda/etc/fish/conf.d/conda.fish
no change     /opt/conda/shell/condabin/Conda.psm1
modified      /opt/conda/shell/condabin/conda-hook.ps1
no change     /opt/conda/lib/python3.12/site-packages/xontrib/conda.xsh
no change     /opt/conda/etc/profile.d/conda.csh
modified      /home/openhands/.bashrc

==> For changes to take effect, close and re-open your current shell. <==

[Python Interpreter: /openhands/poetry/openhands-ai-5O4_aCHf-py3.11/bin/python]
openhands@fd03868f5a7a:/workspace $

==============
CodeActAgent LEVEL 0 LOCAL STEP 1 GLOBAL STEP 1

13:20:03 - openhands:INFO: llm.py:308 - Input tokens: 4431 | Output tokens: 49

13:20:03 - ACTION
**CmdRunAction (source=EventSource.AGENT)**
THOUGHT: The Conda initialization has been completed, and the `.bashrc` file has been modified. To apply these changes, I'll source the `.bashrc` file:
COMMAND:
source ~/.bashrc
13:22:03 - openhands:ERROR: runtime.py:435 - No response received within the timeout period.
13:22:03 - openhands:INFO: runtime.py:316 -
-----------------------------------Container logs:-----------------------------------
    |[W 2024-10-06 13:21:46.982 KernelGatewayApp] WebSocket ping timeout after 119999 ms.
    |[I 2024-10-06 13:21:51.987 KernelGatewayApp] Starting buffering for b4a138d1-3974-45cf-b847-115ff3a75cea:d13c3646-5028a3b2ec321752462fc14c
--------------------------------------------------------------------------------
13:22:03 - OBSERVATION
ErrorObservation(content='Action execution timed out after 120 seconds.', observation='error')

==============
CodeActAgent LEVEL 0 LOCAL STEP 2 GLOBAL STEP 2

13:22:06 - openhands:INFO: llm.py:308 - Input tokens: 4510 | Output tokens: 36

13:22:06 - ACTION
**CmdRunAction (source=EventSource.AGENT)**
THOUGHT: It seems that sourcing the `.bashrc` file took too long. Let's try running it again:
COMMAND:
source ~/.bashrc
13:23:32 - openhands:INFO: manager.py:68 - Session 319fb6fc-afcb-4f1c-8748-dec483318c4e and related resource have been removed due to inactivity.
13:23:32 - openhands:INFO: manager.py:68 - Session 0020201a-c7d6-43cb-811d-1de4db19802d and related resource have been removed due to inactivity.
13:24:06 - openhands:ERROR: runtime.py:435 - No response received within the timeout period.
13:24:06 - OBSERVATION
ErrorObservation(content='Action execution timed out after 120 seconds.', observation='error')

==============
CodeActAgent LEVEL 0 LOCAL STEP 3 GLOBAL STEP 3

13:24:10 - openhands:INFO: llm.py:308 - Input tokens: 4576 | Output tokens: 47

13:24:10 - ACTION
**CmdRunAction (source=EventSource.AGENT)**
THOUGHT: It appears that sourcing the `.bashrc` file is timing out. Let's try a different approach by directly activating the `pytorch_env` environment:
COMMAND:
conda activate pytorch_env
13:26:10 - openhands:ERROR: runtime.py:435 - No response received within the timeout period.
13:26:10 - OBSERVATION
ErrorObservation(content='Action execution timed out after 120 seconds.', observation='error')

==============
CodeActAgent LEVEL 0 LOCAL STEP 4 GLOBAL STEP 4

13:26:15 - openhands:INFO: llm.py:308 - Input tokens: 4653 | Output tokens: 63

13:26:15 - ACTION
**CmdRunAction (source=EventSource.AGENT)**
THOUGHT: It seems that directly activating the `pytorch_env` environment is also timing out. Let's try a different approach by sourcing the `.bashrc` file and then immediately activating the environment in one command:
COMMAND:
source ~/.bashrc && conda activate pytorch_env
13:28:15 - openhands:ERROR: runtime.py:435 - No response received within the timeout period.
13:28:15 - OBSERVATION
ErrorObservation(content='Action execution timed out after 120 seconds.', observation='error')
SmartManoj commented 1 month ago

CONTENT: We just installed the environment, but did not do conda init, you help me initialize, and then source ~/.bashrc update. Then enter the pytorch_env environment in conda

Conda is not needed as the code is running in an isolated environment. What's your use case?

x66ccff commented 1 month ago

in fact the agent always struggle to build a conda env, so i try to build a pytorch env from the docker, so that the agent can start quickly from running the code. instead of installing packages

Do you have any suggestions? here are my dockerfile

# Stage 1: Prepare GPU environment based on NVIDIA CUDA image
FROM nvidia/cuda:12.6.1-devel-ubi8 AS cuda

# Stage 2: Use specified image
FROM ghcr.io/all-hands-ai/runtime:0.9-nikolaik

# Set non-interactive frontend
ENV DEBIAN_FRONTEND=noninteractive

# Install necessary dependencies, including sudo, g++, build-essential, and other common tools
RUN apt-get update && \
    apt-get install -y sudo g++ build-essential wget bzip2 ca-certificates \
    libglib2.0-0 libxext6 libsm6 libxrender1 git && \
    apt-get clean

# Copy CUDA toolchain and libraries from CUDA image
COPY --from=cuda /usr/local/cuda /usr/local/cuda

# Set CUDA environment variables
ENV PATH="/usr/local/cuda/bin:${PATH}"

# Explicitly initialize LD_LIBRARY_PATH to avoid undefined variable warnings
ENV LD_LIBRARY_PATH="/usr/local/cuda/lib64"

# Download and install specified version of Anaconda
RUN wget https://repo.anaconda.com/archive/Anaconda3-2024.06-1-Linux-x86_64.sh -O anaconda.sh && \
    bash anaconda.sh -b -p /opt/conda && \
    rm anaconda.sh

# Set conda environment variables
ENV PATH="/opt/conda/bin:${PATH}"

# Initialize conda
RUN conda init bash

# Create a new conda environment and install PyTorch
RUN conda create -n pytorch_env python=3.9 -y && \
    . /opt/conda/etc/profile.d/conda.sh && \
    conda activate pytorch_env && \
    conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia -y

# Set default conda environment
RUN echo "conda activate pytorch_env" >> ~/.bashrc

# Clean cache to reduce image size
RUN apt-get clean && rm -rf /var/lib/apt/lists/* && conda clean -a -y

# Set working directory
WORKDIR /workspace

# Set default command
CMD ["/bin/bash"]

# Initialize conda
RUN conda init

# Set default conda environment
RUN echo "conda activate pytorch_env" >> ~/.bashrc
SmartManoj commented 1 month ago

13:28:15 - OBSERVATION ErrorObservation(content='Action execution timed out after 120 seconds.', observation='error')

You could increase the timeout in the config.toml under [sandbox] group.

x66ccff commented 1 month ago

@SmartManoj thank, i will try it

SmartManoj commented 1 month ago

Also, could you manually run source ~/.bashrc in the docker container to check how long it will take?

x66ccff commented 1 month ago

seems start openhands via docker can not change the timeout limit right? i'm not familiar with it..

and, i ve try manually go into the docker image. What surprised me is that it is actually already inited. I can not figure out why. Seems that this only happens when the openhands agent is operating the conda env.

x66ccff commented 1 month ago

it seems when the agent run sudo -s inside the docker also run into this issue. (I change the docker image and installed sudo cmd)

x66ccff commented 1 month ago

i suggest, maybe we can let the agent run as root instead of user openhands? I think this issue is related with the root user in the docker

SmartManoj commented 1 month ago

seems start openhands via docker can not change the timeout limit

You can pass -e SANDBOX_TIMEOUT=value to the docker command.

i suggest, maybe we can let the agent run as root instead of user openhands

-e RUN_AS_OPENHANDS=False will do that.

github-actions[bot] commented 3 weeks ago

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] commented 2 weeks ago

This issue was closed because it has been stalled for over 30 days with no activity.