bilayer-containers / bilayers

https://bilayers.org
3 stars 1 forks source link

Consider if pixi source container support adds too much hassle #65

Closed bethac07 closed 1 month ago

bethac07 commented 1 month ago

Have you searched for similar issues?

What type of feature are you requesting?

Something else

Feature Details

Everything you want to run {EDITED TO ADD: in a second Dockerfile that you want to FROM the first Dockerfile} will need to be prepended with the shell-hook, which is a bummer :(

Additional context

No response

bethac07 commented 1 month ago

(I strong suspect there is a better thing to do, but I don't understand pixi well enough to know what it is)

gnodar01 commented 1 month ago

I'm not sure I understand the issue. The shell-hook is set as the entrypoint so you wouldn't need to manually prepend anything when using the run command.

For example:

❯ cat Dockerfile
FROM ghcr.io/prefix-dev/pixi:0.31.0-jammy AS build

WORKDIR /src

RUN pixi init

RUN pixi add scikit-image

RUN pixi shell-hook > /shell-hook.sh

RUN echo 'exec "$@"' >> /shell-hook.sh

FROM ubuntu:jammy AS production

COPY --from=build /src/ /src/
COPY --from=build /src/.pixi/envs/default/ /src/.pixi/envs/default/
COPY --from=build /shell-hook.sh /shell-hook.sh

WORKDIR /src

ENTRYPOINT ["/bin/bash", "/shell-hook.sh"]

❯ docker run --rm -it test:0.0.1 env
HOSTNAME=7bb2cb519e8c
PIXI_PROMPT=(src)
PWD=/src
CONDA_PREFIX=/src/.pixi/envs/default
PIXI_PROJECT_MANIFEST=/src/pixi.toml
PIXI_PROJECT_NAME=src
HOME=/root
PIXI_ENVIRONMENT_NAME=default
PIXI_IN_SHELL=1
PIXI_EXE=/usr/local/bin/pixi
TERM=xterm
SHLVL=0
PIXI_PROJECT_VERSION=0.1.0
PIXI_PROJECT_ROOT=/src
CONDA_DEFAULT_ENV=src
PIXI_ENVIRONMENT_PLATFORMS=linux-64
PATH=/src/.pixi/envs/default/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

❯ docker run --rm -it test:0.0.1 which python
/src/.pixi/envs/default/bin/python

❯ docker run --rm -it test:0.0.1 python -m site
sys.path = [
    '/src',
    '/src/.pixi/envs/default/lib/python312.zip',
    '/src/.pixi/envs/default/lib/python3.12',
    '/src/.pixi/envs/default/lib/python3.12/lib-dynload',
    '/src/.pixi/envs/default/lib/python3.12/site-packages',
]
USER_BASE: '/root/.local' (doesn't exist)
USER_SITE: '/root/.local/lib/python3.12/site-packages' (doesn't exist)
ENABLE_USER_SITE: True

❯ docker run --rm -it test:0.0.1 python -c "import skimage; print(skimage.__version__)"
0.24.0
bethac07 commented 1 month ago

Sorry, was making notes to myself and didn't end up getting back to them yesterday; was just coming here to update.

The issue isn't on regular container execution - the issue is the entrypoint isn't run when you FROM the container, so when you want to say, use a second container to FROM the first container and then pip install gradio hypothetically, pip isn't found.

I spent some time dorking around this morning with this and I suspect the solution falls in the realm of adding to the tool container something like SHELL ["/bin/bash", "/shell-hook.sh"], but I can't seem to find the right combo of shell flags to make things work quite right, I mostly get exec: not found issues when trying to wrap the tool container in a simple dockerfile like below.

FROM instanseg_pixi:latest

RUN echo hello

RUN pip install plotly
gnodar01 commented 1 month ago

It's not a great idea to pip install into pixi environments, so pip is removed in pixi. The approach to take would be to install the pixi binary and use it:

FROM instanseg_pixi:latest AS build

RUN apt-get update && apt-get install curl

RUN curl -fsSL https://pixi.sh/install.sh | bash

RUN source /root/.bashrc

RUN pixi add plotly # or pixi add --pypi plotly if you specifically want pypi packages instead of conda-forge

# regenerate the shell-hook.sh as before if doing a multistage build
bethac07 commented 1 month ago

It's not a great idea to pip install into pixi environments, so pip is removed in pixi

Nope, this works -

docker run --rm -it instanseg_pixi:latest pip install plotly

WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
Collecting plotly
  Downloading plotly-5.24.1-py3-none-any.whl.metadata (7.3 kB)
Collecting tenacity>=6.2.0 (from plotly)
  Downloading tenacity-9.0.0-py3-none-any.whl.metadata (1.2 kB)
Requirement already satisfied: packaging in ./.pixi/envs/default/lib/python3.9/site-packages (from plotly) (24.1)
Downloading plotly-5.24.1-py3-none-any.whl (19.1 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 19.1/19.1 MB 19.1 MB/s eta 0:00:00
Downloading tenacity-9.0.0-py3-none-any.whl (28 kB)
Installing collected packages: tenacity, plotly
Successfully installed plotly-5.24.1 tenacity-9.0.0

That's a side-issue, though, because it's not that I want to put more stuff into the tool container as such; it's that the second stage of bilayers, the interface creation, needs access to a working copy of python to install the dependencies (gradio, jupyter, etc) during its build, which we'd previously said we wanted to be provided by the tool container and I think in general, we typically want provided by the tool's Python environment specifically, if it's a Python tool (otherwise tools that are like Instanseg that must be run with python file.py --cli-flag-1 etc won't work). So we need the pixi enviornment active when the container is FROMed, which again I suspect we can do with SHELL I just haven't gotten the magic right set of flags yet. Make more sense?

gnodar01 commented 1 month ago

Nope, this works - docker run --rm -it instanseg_pixi:latest pip install plotly

Hmm. That's very weird and not reproducible in the general case:

❯ cat Dockerfile

FROM ghcr.io/prefix-dev/pixi:0.31.0-jammy AS build

WORKDIR /src

RUN pixi init

RUN pixi add scikit-image

RUN pixi shell-hook > /shell-hook.sh

RUN echo 'exec "$@"' >> /shell-hook.sh

FROM ubuntu:jammy AS production

COPY --from=build /src/ /src/
COPY --from=build /src/.pixi/envs/default/ /src/.pixi/envs/default/
COPY --from=build /shell-hook.sh /shell-hook.sh

WORKDIR /src

ENTRYPOINT ["/bin/bash", "/shell-hook.sh"]

❯ docker build -t test:0.0.1 --file Dockerfile .

❯ cat Dockerfile.extend

FROM test:0.0.1

RUN apt-get update && apt-get -y install curl

RUN curl -fsSL https://pixi.sh/install.sh | bash

SHELL ["/bin/bash", "-l", "-c"]
ENV PATH="/root/.pixi/bin:$PATH"

RUN pixi add plotly

RUN pixi shell-hook > /shell-hook.sh
RUN echo 'exec "$@"' >> /shell-hook.sh

ENTRYPOINT ["/bin/bash", "/shell-hook.sh"]

❯ docker build -t test-ext:0.0.1 --file Dockerfile.extend .

❯ docker run --rm -it test-ext:0.0.1 pip --version

/shell-hook.sh: line 14: exec: pip: not found # as it normally should be

needs access to a working copy of python to install the dependencies

That should not be a problem:

❯ docker run --rm -it test-ext:0.0.1 which python
/src/.pixi/envs/default/bin/python

So we need the pixi enviornment active when the container is FROM

Again see above, I FROM it in the second Dockerfile, and am able to install deps into the environment.

bethac07 commented 1 month ago

Yes, I see it's possible, by adding pixi back in, but I don't think that's what we want to do in interface creation, do we? In a perfect world, I don't think we want to have to track whether our tool containers needs Gradio-pixi vs Gradio-conda vs Gradio-pip; I think that adds a bunch of complexity. It might be unavoidable but I'd like to figure out a way to avoid it if we can.

gnodar01 commented 1 month ago

I don't think any tinkering with the SHELL instruction is going to do the trick. I have banged by head against that several times and it never does what I want. For instance you might think this would do the trick:

SHELL ["/bin/bash", "-c", "source /shell-hook.sh &&"]

But it does not, and similar things would not. I would absolutely love to be shown a working way to do that, but I don't know of a way to source a script that sets env variables before each RUN command, without prepending them to the RUN commands.

That's all incidental though. If we're not using pixi, then there's not much shell-hook provides that's useful:

export PATH="/src/.pixi/envs/default/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
export CONDA_PREFIX="/src/.pixi/envs/default"
export PIXI_PROJECT_NAME="src"
export PIXI_PROJECT_VERSION="0.1.0"
export PIXI_EXE="/usr/local/bin/pixi"
export PIXI_IN_SHELL="1"
export PIXI_PROJECT_MANIFEST="/src/pixi.toml"
export PIXI_PROJECT_ROOT="/src"
export CONDA_DEFAULT_ENV="src"
export PIXI_ENVIRONMENT_NAME="default"
export PIXI_ENVIRONMENT_PLATFORMS="linux-64"
export PIXI_PROMPT="(src) "

exec "$@"

The only thing, agnostic of pixi the tool, is the modification to PATH, and maybe CONDA_PREFIX if we were to be using conda/mamba in the image:

export PATH="/src/.pixi/envs/default/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
export CONDA_PREFIX="/src/.pixi/envs/default"

So this approach works:

❯ cat Dockerfile
FROM ghcr.io/prefix-dev/pixi:0.31.0-jammy AS build

WORKDIR /src

RUN pixi init

RUN pixi add scikit-image

# any other installations

RUN pixi add pip

RUN pixi shell-hook > /shell-hook.sh

RUN echo 'exec "$@"' >> /shell-hook.sh

FROM ubuntu:jammy AS production

COPY --from=build /src/ /src/
COPY --from=build /src/.pixi/envs/default/ /src/.pixi/envs/default/
COPY --from=build /shell-hook.sh /shell-hook.sh

WORKDIR /src

ENTRYPOINT ["/bin/bash", "/shell-hook.sh"]

❯ cat Dockerfile.extend
FROM test:0.0.2

ENV PATH="/src/.pixi/envs/default/bin:$PATH"

RUN pip install plotly

I had pixi install pip in the base image, which is a bit icky, but meh.

It fixes the prepending issue, but of course, setting ENV to append that specific path is still conditional on if we're extending a pixi image. Actually it's specific to the base image's workdir as well because of the /src/.

But that begs the question, how would we avoid the issue in tools that use conda/mamba or virtualenv, where we have to somehow activate the environment beforehand? If we're going to be mostly wrapping other pre-built dockers then I'm not sure we can avoid some amount of conditioning on pip vs conda/mamba vs (perhaps more and more) pixi.

bethac07 commented 1 month ago

Allegedly, an automatic prepending seems to be possible with conda, I'm trying it now; this environment takes an hour to resolve though in non-mamba conda (blech) so tbd. But one thing that definitely DOES work is using conda env update in the pytorch container's base conda environment - see below.

(we should probably pair on this at this point rather than back-and-forth comments! lmk if today, tomorrow, etc is good. We should indeed try to come to some best-practices for the specific and also the general case. If we can figure out a generic strategy that works for activating environments (more likely, strategies - one for venv, one for conda, one for pixi), hopefully we can create lightweight strategies of 'if someone wants to use tool X which comes in a container that requires activation, we have documentation on lightweight Dockerfiles they can use to turn that into a tool container that doesn't need manual, which is then the thing that goes into the bilayers ci/cd'.)

(base) bcimini@wm4f8-761 instanseg_inference % docker run --rm -it instanseg_conda:latest pip install plotly                  
WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
Collecting plotly
  Downloading plotly-5.24.1-py3-none-any.whl.metadata (7.3 kB)
Collecting tenacity>=6.2.0 (from plotly)
  Downloading tenacity-9.0.0-py3-none-any.whl.metadata (1.2 kB)
Requirement already satisfied: packaging in /opt/conda/lib/python3.10/site-packages (from plotly) (23.1)
Downloading plotly-5.24.1-py3-none-any.whl (19.1 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 19.1/19.1 MB 21.7 MB/s eta 0:00:00
Downloading tenacity-9.0.0-py3-none-any.whl (28 kB)
Installing collected packages: tenacity, plotly
Successfully installed plotly-5.24.1 tenacity-9.0.0
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
(base) bcimini@wm4f8-761 instanseg_inference % cat Dockerfile_from 
FROM instanseg_conda:latest

#SHELL ["/bin/bash", "/shell-hook.sh"]

RUN echo hello

RUN pip install plotly
(base) bcimini@wm4f8-761 instanseg_inference % docker run --rm -it test_from:latest                                               
WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
root@6625919db332:/instanseg# which python
/opt/conda/bin/python
root@6625919db332:/instanseg# pip freeze | grep Ins
-e git+https://github.com/instanseg/instanseg.git@b9c9d6b6860f548a4b6bb9151270e390d9bfe52e#egg=InstanSeg
root@6625919db332:/instanseg# pip freeze | grep plot
matplotlib-inline @ file:///opt/conda/conda-bld/matplotlib-inline_1662014470464/work
plotly==5.24.1
gnodar01 commented 1 month ago

Hmm. That's very weird

BTW this answers where pip came from.

gnodar01 commented 1 month ago

Allegedly, an automatic prepending seems to be possible with conda

The same thing is doable with pixi, but only if pixi is actually available.

For the rest, we'll have to sync up. I might be wrong, but a skimming through the dockerfile I think the reason its working that way is because the instanseg environment is created but never actually used. Instead everything is installed into base, and hence no activation would be needed.

bethac07 commented 1 month ago

In discussion, we figured out ways to solve things with setting both SHELL and ENTRYPOINT, especially because SHELL propagates when that Docker is from'ed. Hooray!

gnodar01 commented 1 month ago

Result of discussion. SHELL is indeed the answer (on top of keeping the pixi binary around), as it propagates across docker builds.

❯ cat Dockerfile
FROM ghcr.io/prefix-dev/pixi:0.31.0-jammy

WORKDIR /src

RUN pixi init

RUN pixi add scikit-image

# any other installations

RUN pixi add pip

SHELL ["pixi", "run", "/bin/bash", "-c"]

ENTRYPOINT ["pixi", "run"]

❯ cat Dockerfile.extend
FROM test-ext:0.0.1

WORKDIR /src

RUN pip install plotly