Issues using OpenHands w/ PyTorch + CUDA

neubig commented 2 hours ago

          Hi, @neubig  you can refer to here https://github.com/SmartManoj/Kevin/issues/65

the bug is, if you install anaconda in the image (using docker). the agent will need to exec conda init first. the conda init command require the user to close the terminal, however, the agent can not close it. so i try to let the agent exec source ./bashrc. Then, it stucks

# Stage 1: Prepare GPU environment based on NVIDIA CUDA image
FROM nvidia/cuda:12.6.1-devel-ubi8 AS cuda

# Stage 2: Use specified image
FROM ghcr.io/all-hands-ai/runtime:0.9-nikolaik

# Set non-interactive frontend
ENV DEBIAN_FRONTEND=noninteractive

# Install necessary dependencies, including sudo, g++, build-essential, and other common tools
RUN apt-get update && \
    apt-get install -y sudo g++ build-essential wget bzip2 ca-certificates \
    libglib2.0-0 libxext6 libsm6 libxrender1 git && \
    apt-get clean

# Copy CUDA toolchain and libraries from CUDA image
COPY --from=cuda /usr/local/cuda /usr/local/cuda

# Set CUDA environment variables
ENV PATH="/usr/local/cuda/bin:${PATH}"

# Explicitly initialize LD_LIBRARY_PATH to avoid undefined variable warnings
ENV LD_LIBRARY_PATH="/usr/local/cuda/lib64"

# Download and install specified version of Anaconda
RUN wget https://repo.anaconda.com/archive/Anaconda3-2024.06-1-Linux-x86_64.sh -O anaconda.sh && \
    bash anaconda.sh -b -p /opt/conda && \
    rm anaconda.sh

# Set conda environment variables
ENV PATH="/opt/conda/bin:${PATH}"

# Initialize conda
RUN conda init bash

# Create a new conda environment and install PyTorch
RUN conda create -n pytorch_env python=3.9 -y && \
    . /opt/conda/etc/profile.d/conda.sh && \
    conda activate pytorch_env && \
    conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia -y

# Set default conda environment
RUN echo "conda activate pytorch_env" >> ~/.bashrc

# Clean cache to reduce image size
RUN apt-get clean && rm -rf /var/lib/apt/lists/* && conda clean -a -y

# Set working directory
WORKDIR /workspace

# Set default command
CMD ["/bin/bash"]

# Initialize conda
RUN conda init

# Set default conda environment
RUN echo "conda activate pytorch_env" >> ~/.bashrc

Originally posted by @x66ccff in https://github.com/All-Hands-AI/OpenHands/issues/2178#issuecomment-2395448086

neubig commented 1 hour ago

Thanks @x66ccff, I opened a separate issue so we can track this.

Just to clarify, are you following our custom sandbox guide? https://docs.all-hands.dev/modules/usage/how-to/custom-sandbox-guide

And also, is your use case that you would like to develop programs with CUDA+PyTorch? I'm just trying to better understand the situation so we can recommend the easiest way to fix the issue.

x66ccff commented 1 hour ago

Just to clarify, are you following our custom sandbox guide? https://docs.all-hands.dev/modules/usage/how-to/custom-sandbox-guide

hmm, i didnt use the config.toml in the entire process. i m not sure whether it is correct. I just make a new docker image use the dockerfile above and then use this to start the openhands.

Seems editing config.toml is only necessary when build from source? not docker?

 docker run -it --pull=always   
  -e SANDBOX_RUNTIME_CONTAINER_IMAGE=kk-openhands-pytorch
  -e SANDBOX_USER_ID=$(id -u)    
 -e WORKSPACE_MOUNT_PATH=$WORKSPACE_BASE  
   -v $WORKSPACE_BASE:/opt/workspace_base 
    -v /var/run/docker.sock:/var/run/docker.sock  
   -p 3000:3000   
  --add-host host.docker.internal:host-gateway  
   --name openhands-app-$(date +%Y%m%d%H%M%S)  
   ghcr.io/all-hands-ai/openhands:0.9

And also, is your use case that you would like to develop programs with CUDA+PyTorch? I'm just trying to better understand the situation so we can recommend the easiest way to fix the issue.

Yeah, but I'm not very familiar with Docker. Any advice would be appreciated!

neubig commented 1 hour ago

OK, I think that there might be an easier way for you to do this. Specifically, you could probably start working directly from one of the official pytorch docker images. You can pick which one best matches your expected version of PyTorch and do something like the following:

 docker run -it --pull=always   
  -e SANDBOX_BASE_CONTAINER_IMAGE=pytorch/pytorch:2.4.1-cuda11.8-cudnn9-runtime
  -e SANDBOX_USER_ID=$(id -u)    
 -e WORKSPACE_MOUNT_PATH=$WORKSPACE_BASE  
   -v $WORKSPACE_BASE:/opt/workspace_base 
    -v /var/run/docker.sock:/var/run/docker.sock  
   -p 3000:3000   
  --add-host host.docker.internal:host-gateway  
   --name openhands-app-$(date +%Y%m%d%H%M%S)  
   ghcr.io/all-hands-ai/openhands:0.9

Note that I changed SANDBOX_RUNTIME_CONTAINER_IMAGE to SANDBOX_BASE_CONTAINER_IMAGE, which will allow OpenHands to install its necessary software on top of the base image.

That may just work for your purposes, but if you still have issues I can help!

All-Hands-AI / OpenHands

Issues using OpenHands w/ PyTorch + CUDA #4230