FEA: Create a docker solution

garyfeng commented 1 month ago

Is your feature request related to a problem? Please describe. As a LaVague user I want to containerize the solution, so that I can do headless testing at scale.

Specifically:

Create a container with Python 3.11, Chrome (and maybe other browsers Selenium supports), corresponding Selenium drivers, and LaVague
Run tasks in headless mode, and be able to see screenshots and text output as it progresses
Use VNC to see real-time browser screen

Describe the solution you'd like

Dockerfile:
- start with python 3.11 slim,
- install Chrome (latest),
- install chromedriver (make sure it's the compatible version)
- install LaVague (and make sure it uses the chromedriver installed, not doing its own thing; if it does, it should be the right version)
set up VNC access to the browser
docker run -rm -it --env-file .env -d $(pwd):/app image_name

Describe alternatives you've considered Alternative to VNC would be to use X11, but that's complicated

Additional context

LaVague used to have a docker option earlier but they seemed to have prioritized pip over docker. Also refer to skyvern project, which has a similar but different goal, and they use the X11 approach

garyfeng commented 1 month ago

Follow https://docs.lavague.ai/en/latest/docs/contributing/general/#dev-environment to set up local env

garyfeng commented 1 month ago

You can use a web-based VNC viewer that doesn't require any installation. This is achievable by using a noVNC server, which allows you to access the VNC session via your browser. The official Selenium Docker images support noVNC, which makes this setup quite straightforward.

Here's how you can use noVNC with the Selenium Docker container:

Steps to Use noVNC with Selenium Docker

Run the Selenium container with noVNC:

The Selenium Docker images come with noVNC pre-installed and enabled. You just need to expose the noVNC port (typically port 7900) when you start the container.

Run the following command to start a Selenium container with noVNC enabled:
```
docker run -d -p 4444:4444 -p 5900:5900 -p 7900:7900 --shm-size="2g" selenium/standalone-chrome
```
This command exposes:
- Port 4444 for WebDriver (to run your Selenium tests),
- Port 5900 for traditional VNC access,
- Port 7900 for noVNC (web-based access).
Access the browser via noVNC:

Once the container is running, you can access the Selenium browser through noVNC in your web browser by visiting:
```
http://localhost:7900
```
When prompted for a password, use the default secret, unless you've changed it.
Running your Selenium tests:

After setting up the container, you can run your Selenium test as usual, and watch the browser in action via the noVNC session running in your web browser.

Conclusion

Using noVNC, you can view the browser in real-time directly from your web browser without any installation of a dedicated VNC viewer. This is the simplest way to monitor Selenium tests visually inside a Docker container.

garyfeng commented 1 month ago

Tring to recover the previous work by LaVague on docker, which seemed to have been deleted since March 2024:

Instructions: https://github.com/lavague-ai/LaVague/blob/a47d5f1643c21b5179040dc36073c3d21870592c/docs/docs/get-started/get-started-docker.md
dockerfile: https://github.com/lavague-ai/LaVague/blob/a47d5f1643c21b5179040dc36073c3d21870592c/docker/Dockerfile
other configs, etc. in the a47d5f1 hash

garyfeng commented 1 month ago

content of the dockerfile from the above. They covered most of what I wanted from the above

FROM nvidia/cuda:12.3.2-devel-ubuntu22.04

ARG USERNAME=vscode
ARG USER_UID=1000
ARG USER_GID=$USER_UID

# Create the user
RUN groupadd --gid $USER_GID $USERNAME \
    && useradd --uid $USER_UID --gid $USER_GID -m $USERNAME \
    && apt-get update \
    && apt-get install -y sudo \
    && echo $USERNAME ALL=\(root\) NOPASSWD:ALL > /etc/sudoers.d/$USERNAME \
    && chmod 0440 /etc/sudoers.d/$USERNAME

ENV DEBIAN_FRONTEND noninteractive

RUN apt-get update && apt-get install -y ca-certificates fonts-liberation unzip \
libappindicator3-1 libasound2 libatk-bridge2.0-0 libatk1.0-0 libc6 \
libcairo2 libcups2 libdbus-1-3 libexpat1 libfontconfig1 libgbm1 \
libgcc1 libglib2.0-0 libgtk-3-0 libnspr4 libnss3 libpango-1.0-0 \
libpangocairo-1.0-0 libstdc++6 libx11-6 libx11-xcb1 libxcb1 wget \
libxcomposite1 libxcursor1 libxdamage1 libxext6 libxfixes3 libxi6 \
libxrandr2 libxrender1 libxss1 libxtst6 lsb-release wget xdg-utils \
software-properties-common git zsh curl

# Install chrome and chromedriver for ubuntu22
RUN wget https://storage.googleapis.com/chrome-for-testing-public/122.0.6261.94/linux64/chrome-linux64.zip \
&& wget https://storage.googleapis.com/chrome-for-testing-public/122.0.6261.94/linux64/chromedriver-linux64.zip \
&& unzip chromedriver-linux64.zip -d /home/$USERNAME \
&& unzip chrome-linux64.zip -d /home/$USERNAME \
&& rm chrome-linux64.zip chromedriver-linux64.zip

# We need python3.10 to build the project
RUN add-apt-repository ppa:deadsnakes/ppa && apt-get update && apt-get install -y python3.10

# We need git to clone the repository, ensure it's installed
RUN apt-get update && apt-get install -y git

USER $USERNAME

# Clone the lavague repository
RUN git clone https://github.com/lavague-ai/LaVague /home/$USERNAME/LaVague

# Make localhost accessible externally
RUN sed -i "s/demo.launch(server_port=server_port, share=True, debug=True)/demo.launch(server_name='0.0.0.0', server_port=server_port, share=True, debug=True)/g" /home/$USERNAME/LaVague/src/lavague/command_center.py

RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.10 \
&& python3.10 -m pip install virtualenv \
&& python3.10 -m pip install /home/$USERNAME/LaVague --force-reinstall \
&& python3.10 -m pip install llama-index-llms-azure-openai

WORKDIR /home/$USERNAME

# Copy the necessary files into the Docker image
COPY config.py /home/$USERNAME/config.py
COPY instructions.txt /home/$USERNAME/instructions.txt
COPY exec.sh /home/$USERNAME/exec.sh

RUN sudo chmod 755 /home/$USERNAME/exec.sh

# fix path problem
ENV PATH="/home/vscode/.local/bin:${PATH}"

# Modify the ENTRYPOINT to execute the lavague-launch command and keep the container running
COPY entrypoint.sh /home/$USERNAME/entrypoint.sh
RUN sudo chmod +x /home/$USERNAME/entrypoint.sh
ENTRYPOINT ["/home/vscode/entrypoint.sh"]

garyfeng commented 1 month ago

config.py:


import os
from llama_index.llms.openai import OpenAI

class LLM(OpenAI):
    def __init__(self):
        max_new_tokens = 512
        api_key = os.getenv("OPENAI_API_KEY")
        if api_key is None:
            raise ValueError("OPENAI_API_KEY environment variable is not set")
        else:
            super().__init__(api_key=api_key, max_tokens=max_new_tokens, temperature=0.0)

entrypoint.sh

#!/usr/bin/bash

# Execute your lavague-launch command
lavague-launch --file_path instructions.txt --config_path config.py

# Keep the container running
tail -f /dev/null

instructions.txt

https://huggingface.co/
Click on the Datasets item on the menu, between Models and Spaces
Click on the search bar 'Filter by name', type 'The Stack', and press 'Enter'
Scroll by 500 pixels

garyfeng commented 1 month ago

there is also a folder called docker-scripts https://github.com/lavague-ai/LaVague/tree/a47d5f1643c21b5179040dc36073c3d21870592c/examples/docker-scripts

garyfeng / LaVague

FEA: Create a docker solution #1

Steps to Use noVNC with Selenium Docker

Conclusion