[WiP] Draft for Gentoo configuration #611

Open TheChymera opened 3 months ago

TheChymera commented 3 months ago

work in progress... from here.

TheChymera commented 3 months ago

@yarikoptic I can't find any sort of Containerfile in this repo... any idea how/what I should adapt?

yarikoptic commented 3 months ago

this is a tool which produces Containerfile's (based on such templates etc), it doesn't contain them. What do you need it for? in any case -- just try to follow what you see done (tests etc) for e.g. neurodebian -- just git grep neurodebian

TheChymera commented 3 months ago

Ok, I'll do some more digging in the actual code, but:

git grep neurodebian

neurodebian is only mentioned in that file and templates/ndfreeze.yaml. I also see no mentions of “container” in templates/. I assume this is all in the logic somewhere.

Basically what needs to be done for Gentoo images, is for this container file I wrote to be loaded, and on the last line whatever packages the user wants should get concatenated.

kaczmarj commented 3 months ago

@TheChymera - dockerfiles are rendered here:

and singularity files are rendered here:

the templates in templates/ are used during the rendering process to construct the dockerfile or singularity file.

TheChymera commented 3 months ago

@kaczmarj any idea where the YAML is loaded? I can't find either “.yaml”, or “neurodebian”, or “templates/” anywhere in that module.

I assume this somehow happens dynamically, and I tried to look for tests... but I couldn't find the relevant test exemplifying this. Tried to break the neurodocker.yaml file on purpose and see where the test breaks, but I only get a nondescript:

ERROR neurodocker/reproenv/tests/ - yaml.scanner.ScannerError: while scanning a simple key

With a traceback that's mainly <frozen importlib._bootstrap>.

Basically what I'd like to do is just give it a series of commands to concatenate... but I see no analogous lines to what I have in the Containerfile in neurodebian.yaml

yarikoptic commented 3 months ago

didn't check but quite often to do such "where is it loaded" I "break" the medium -- make file not readable, or break json or yaml formatting -- that causes failure to occur and I see in the traceback where is it happening.

yarikoptic commented 3 months ago

ha -- now read fully and saw that you thought alike ;) ok, will look inside now

yarikoptic commented 3 months ago
for me breaking yaml resulted in following traceback ```shell neurodocker/ in reproenv.register_template(template_path) neurodocker/reproenv/ in register template = yaml.load(f, Loader=SafeLoader) venvs/dev3/lib/python3.11/site-packages/yaml/ in load return loader.get_single_data() venvs/dev3/lib/python3.11/site-packages/yaml/ in get_single_data node = self.get_single_node() yaml/_yaml.pyx:673: in yaml._yaml.CParser.get_single_node ??? yaml/_yaml.pyx:687: in yaml._yaml.CParser._compose_document ??? yaml/_yaml.pyx:731: in yaml._yaml.CParser._compose_node ??? yaml/_yaml.pyx:845: in yaml._yaml.CParser._compose_mapping_node ??? yaml/_yaml.pyx:729: in yaml._yaml.CParser._compose_node ??? yaml/_yaml.pyx:808: in yaml._yaml.CParser._compose_sequence_node ??? yaml/_yaml.pyx:860: in yaml._yaml.CParser._parse_next_event ??? E yaml.parser.ParserError: while parsing a block collection E in "/home/yoh/proj/repronim/neurodocker/neurodocker/templates/neurodebian.yaml", line 6, column 3 E did not find expected '-' indicator E in "/home/yoh/proj/repronim/neurodocker/neurodocker/templates/neurodebian.yaml", line 18, column 3 ```

so points to -- that registers templates. And then all jinja templating happens in

so points to -- that registers templates. And then all jinja templating happens in

TheChymera commented 3 months ago

Ok, so I think I finally found a test I think I could work off of.

Basically I'm looking for some test that shows how a neurodebian Containerfile is generated, and then determine how/what to change to get my gentoo Containerfile instead.

Sadly I get the following:

(mydev) [deco]~/src/neurodocker ❱ python -m pytest -vvs neurodocker/cli/tests/
=========================================================== test session starts ============================================================
platform linux -- Python 3.11.8, pytest-8.1.1, pluggy-1.5.0 -- /home/chymera/src/neurodocker/.venvs/mydev/bin/python
cachedir: .pytest_cache
rootdir: /home/chymera/src/neurodocker
configfile: pyproject.toml
plugins: cov-5.0.0, reportlog-0.4.0, xdist-3.5.0
collected 1 item

neurodocker/cli/tests/ FAILED

================================================================= FAILURES =================================================================
____________________________________________________________ test_gentoo_image _____________________________________________________________

tmp_path = PosixPath('/tmp/pytest-of-chymera/pytest-10/test_gentoo_image0')

    def test_gentoo_image(tmp_path: Path):
        # also add singularity like in the test above

        cmd = "neurodocker generate docker"

        runner = CliRunner()
        result = runner.invoke(
                "--pkg-manager apt",
                "--base-image neurodebian:bullseye",
                "--ants version=2.4.3",
                "--user nonroot"
>       assert result.exit_code == 0, result.output
E       AssertionError: Usage: generate [OPTIONS] COMMAND [ARGS]...
E         Try 'generate --help' for help.
E         Error: No such command 'neurodocker generate docker'.
E       assert 2 == 0
E        +  where 2 = <Result SystemExit(2)>.exit_code

cmd        = 'neurodocker generate docker'
result     = <Result SystemExit(2)>
runner     = <click.testing.CliRunner object at 0x7f92bbd442d0>
tmp_path   = PosixPath('/tmp/pytest-of-chymera/pytest-10/test_gentoo_image0')

neurodocker/cli/tests/ AssertionError

Any idea why this is happening?

The command works if I run it from bash — I got it from here →

yarikoptic commented 3 months ago

make it not cmd = "neurodocker generate docker" but cmd = "docker" -- look at other examples how it is done, e.g. git grep -A4 CliRunner and see which "commands" are provided

yarikoptic commented 2 months ago


neurodocker generate docker --pkg-manager portage --base-image gentoo --gentoo portage_image_version=20240324 --install afni
TheChymera commented 2 months ago

new command, works better:

neurodocker generate docker --pkg-manager portage --base-image " as portage" --base-image "" --gentoo gentoo_hash=2d25617a1d085316761b06c17a93ec972f172fc6 --install afni

Ongoing issues:

  1. --gentoo needs a parameter, won't work with no parameters even if they have defaults
  2. putting the entire docker resource link in the call is cumbersome, at least the resource identifier should be hard-codeable and the versions should have defaults which we know work lest users have to spend time navigating the latest releases and gentoo bugzilla
  3. There is an as_ parameter for the FROM line generator but I couldn't figure out how to access this.

@yarikoptic thanks for your help yesterday, I think I can navigate this better now, but feedback on the above would still be helpful :)

yarikoptic commented 2 months ago
  1. --gentoo needs a parameter, won't work with no parameters even if they have defaults

IMHO worth filing a dedicated issue or PR to address that

  1. putting the entire docker resource link ...

sorry - already forgot what would be that resource link, but may be it is something to template in the urls: ?

3. There is an as_ parameter for the FROM line generator but I couldn't figure out how to access this.

I see it only accesible/used in the tests... so most likely it is only for internal use atm ```shell ❯ git grep '\.from_(' neurodocker/reproenv/tests/ r.from_(base_image) neurodocker/reproenv/tests/ r.from_("debian:buster-slim", as_="builder") neurodocker/reproenv/tests/ r.from_("debian:buster-slim") neurodocker/reproenv/tests/ # r: _Renderer = renderer("apt").from_("my_base_image").run("echo foobar") neurodocker/reproenv/tests/ r = renderer_cls("apt").from_("my_base_image").run("echo foobar") neurodocker/reproenv/tests/ r2: _Renderer = renderer_cls("apt").from_("my_base_image").run("echo foobar") neurodocker/reproenv/tests/ r.from_("baseimage") neurodocker/reproenv/tests/ d.from_("alpine") neurodocker/reproenv/tests/ d.from_("alpine", as_="builder") neurodocker/reproenv/tests/ d.from_("alpine", as_="builder") neurodocker/reproenv/tests/ d.from_("alpine", as_="builder") neurodocker/reproenv/tests/ d.from_("alpine", as_="builder") neurodocker/reproenv/tests/ d.from_("alpine", as_="builder") neurodocker/reproenv/tests/ d.from_("alpine", as_="builder") neurodocker/reproenv/tests/ d.from_("alpine", as_="builder") neurodocker/reproenv/tests/ d.from_("alpine", as_="builder") neurodocker/reproenv/tests/ d.from_("alpine", as_="builder") neurodocker/reproenv/tests/ d.from_("debian:buster-slim") neurodocker/reproenv/tests/ s.from_("alpine") neurodocker/reproenv/tests/ s.from_("alpine") neurodocker/reproenv/tests/ s.from_("alpine") neurodocker/reproenv/tests/ s.from_("alpine") neurodocker/reproenv/tests/ s.from_("alpine") neurodocker/reproenv/tests/ s.from_("alpine") neurodocker/reproenv/tests/ s.from_("alpine") neurodocker/reproenv/tests/ s.from_("debian:buster-slim") ```
yarikoptic commented 2 months ago

FWIW -- here is an example of a target use case -

a dirty script with invocation of neurodocker ```shell (dev3) (zarr-manifolds) yoh@typhon:~/proj/repronim/dsst-defacing-pipeline$ cat ./ #!/bin/bash set -eu generate() { # more details might come on [ "$1" == singularity ] && add_entry=' "$@"' || add_entry='' #neurodocker generate "$1" \ #ndversion=0.9.5 #ndversion=master #docker run --rm repronim/neurodocker:$ndversion \ # ATM needs devel version of neurodocker for a fix to AFNI recipe #--base-image neurodebian:bookworm \ #--ndfreeze date=20240320 \ dest=/opt/dsst-defacing-pipeline neurodocker \ generate "$1" \ --pkg-manager portage \ --base-image " as portage" \ --base-image "" \ --gentoo gentoo_hash=2d25617a1d085316761b06c17a93ec972f172fc6 \ --install afni fsl \ --copy environment.yml /opt/environment.yml \ --copy src "$dest" \ --miniconda \ version=latest \ env_name=dsstdeface \ env_exists=false \ yaml_file=/opt/environment.yml \ --user=dsst \ --entrypoint "$dest/" #--run "curl -sL | bash - " \ #--install nodejs npm \ #--run "npm install -g bids-validator@1.14.4" \ #--fsl version= \ } generate docker > Dockerfile # generate singularity > Singularity ```
# Generated by Neurodocker and Reproenv. FROM as portage FROM RUN COPY --from=portage /var/db/repos/gentoo /var/db/repos/gentoo \ && RUN emerge -v --noreplace dev-vcs/git \ && RUN emerge -v1u portage \ # Pinned commits for the dependency tree state && ARG gentoo_hash=2d25617a1d085316761b06c17a93ec972f172fc6 \ && ARG science_hash=73916dd3680ffd92e5bd3d32b262e5d78c86a448 \ && ARG FEATURES="-ipc-sandbox -network-sandbox -pid-sandbox" \ # This will be bound, and contents available outside of container && RUN mkdir /outputs \ && COPY gentoo-portage/ /etc/portage/ \ # Moving gentoo repo from default rsync to git && RUN rm /var/db/repos/gentoo -rf \ # Cloning manually to prevent vdb update, pinning state via git # Allegedly it's better to chain everything in one command, something with container layers 🤔 && RUN \ REPO_URL=$(grep "^sync-uri" /etc/portage/repos.conf/gentoo | sed -e "s/sync-uri *= *//g") && \ mkdir -p /var/db/repos/gentoo && pushd /var/db/repos/gentoo && git init . && \ git remote add origin ${REPO_URL} && \ git fetch --filter="blob:none" origin $gentoo_hash && \ git reset --hard $gentoo_hash && rm .git -rf && popd && \ REPO_URL=$(grep "^sync-uri" /etc/portage/repos.conf/science | sed -e "s/sync-uri *= *//g") && \ mkdir -p /var/db/repos/science && pushd /var/db/repos/science && git init . && \ git remote add origin ${REPO_URL} && \ git fetch --filter="blob:none" origin $science_hash && \ git reset --hard $science_hash && rm .git -rf && popd \ # Old Christian: Remove sync-uri to not accidentally re-sync if we work with the package management interactively # Christian from the future: Maybe we want the option to re-sync if we're debugging it interactively... #RUN sed -i /etc/portage/repos.conf/{gentoo,science} -e "s/sync-type *= *git/sync-type =/g" #RUN sed -i /etc/portage/repos.conf/{gentoo,science} -e "/sync-uri/d" #RUN sed -i /etc/portage/repos.conf/{gentoo,science} -e "/sync-git-verify-commit-signature/d" # Make sure all CPU flags supported by the hardware are whitelisted # This only affects packages with handwritten assembly language optimizations, e.g. ffmpeg. # Removing it is safe, software will just not take full advantage of processor capabilities. #RUN emerge cpuid2cpuflags #RUN echo "*/* $(cpuid2cpuflags)" > /etc/portage/package.use/00cpu-flags ### Emerge cool stuff here ### Autounmask-continue enables all features on dependencies which the top level packages require ### By default this needs user confirmation which would interrupt the build. RUN emerge --autounmask-continue \ afni \ fsl \ && rm -rf /var/tmp/portage/* COPY ["environment.yml", \ "/opt/environment.yml"] COPY ["src", \ "/opt/dsst-defacing-pipeline"] ENV CONDA_DIR="/opt/miniconda-latest" \ PATH="/opt/miniconda-latest/bin:$PATH" RUN \ # Install dependencies. && export PATH="/opt/miniconda-latest/bin:$PATH" \ && echo "Downloading Miniconda installer ..." \ && conda_installer="/tmp/" \ && curl -fsSL -o "$conda_installer" \ && bash "$conda_installer" -b -p /opt/miniconda-latest \ && rm -f "$conda_installer" \ && conda update -yq -nbase conda \ # Prefer packages in conda-forge && conda config --system --prepend channels conda-forge \ # Packages in lower-priority channels not considered if a package with the same # name exists in a higher priority channel. Can dramatically speed up installations. # Conda recommends this as a default # && conda config --set channel_priority strict \ && conda config --system --set auto_update_conda false \ && conda config --system --set show_channel_urls true \ # Enable `conda activate` && conda init bash \ && conda env create --name dsstdeface --file /opt/environment.yml \ # Clean up && sync && conda clean --all --yes && sync \ && rm -rf ~/.cache/pip/* RUN test "$(getent passwd dsst)" \ || useradd --no-user-group --create-home --shell /bin/bash dsst USER dsst ENTRYPOINT ["/opt/dsst-defacing-pipeline/"] # Save specification to JSON. USER root RUN printf '{ \ "pkg_manager": "portage", \ "existing_users": [ \ "root" \ ], \ "instructions": [ \ { \ "name": "from_", \ "kwds": { \ "base_image": " as portage" \ } \ }, \ { \ "name": "from_", \ "kwds": { \ "base_image": "" \ } \ }, \ { \ "name": "run", \ "kwds": { \ "command": "COPY --from=portage /var/db/repos/gentoo /var/db/repos/gentoo\\nRUN emerge -v --noreplace dev-vcs/git\\nRUN emerge -v1u portage\\n# Pinned commits for the dependency tree state\\nARG gentoo_hash=2d25617a1d085316761b06c17a93ec972f172fc6\\nARG science_hash=73916dd3680ffd92e5bd3d32b262e5d78c86a448\\nARG FEATURES=\\"-ipc-sandbox -network-sandbox -pid-sandbox\\"\\n# This will be bound, and contents available outside of container\\nRUN mkdir /outputs\\nCOPY gentoo-portage/ /etc/portage/\\n# Moving gentoo repo from default rsync to git\\nRUN rm /var/db/repos/gentoo -rf\\n# Cloning manually to prevent vdb update, pinning state via git\\n# Allegedly it'"'"'s better to chain everything in one command, something with container layers \\ud83e\\udd14\\nRUN \\\\\\n REPO_URL=$\(grep \\"^sync-uri\\" /etc/portage/repos.conf/gentoo | sed -e \\"s/sync-uri *= *//g\\"\) && \\\\\\n mkdir -p /var/db/repos/gentoo && pushd /var/db/repos/gentoo && git init . && \\\\\\n git remote add origin ${REPO_URL} && \\\\\\n git fetch --filter=\\"blob:none\\" origin $gentoo_hash && \\\\\\n git reset --hard $gentoo_hash && rm .git -rf && popd && \\\\\\n REPO_URL=$\(grep \\"^sync-uri\\" /etc/portage/repos.conf/science | sed -e \\"s/sync-uri *= *//g\\"\) && \\\\\\n mkdir -p /var/db/repos/science && pushd /var/db/repos/science && git init . && \\\\\\n git remote add origin ${REPO_URL} && \\\\\\n git fetch --filter=\\"blob:none\\" origin $science_hash && \\\\\\n git reset --hard $science_hash && rm .git -rf && popd\\n# Old Christian: Remove sync-uri to not accidentally re-sync if we work with the package management interactively\\n# Christian from the future: Maybe we want the option to re-sync if we'"'"'re debugging it interactively...\\n#RUN sed -i /etc/portage/repos.conf/{gentoo,science} -e \\"s/sync-type *= *git/sync-type =/g\\"\\n#RUN sed -i /etc/portage/repos.conf/{gentoo,science} -e \\"/sync-uri/d\\"\\n#RUN sed -i /etc/portage/repos.conf/{gentoo,science} -e \\"/sync-git-verify-commit-signature/d\\"\\n# Make sure all CPU flags supported by the hardware are whitelisted\\n# This only affects packages with handwritten assembly language optimizations, e.g. ffmpeg.\\n# Removing it is safe, software will just not take full advantage of processor capabilities.\\n#RUN emerge cpuid2cpuflags\\n#RUN echo \\"*/* $\(cpuid2cpuflags\)\\" > /etc/portage/package.use/00cpu-flags\\n### Emerge cool stuff here\\n### Autounmask-continue enables all features on dependencies which the top level packages require\\n### By default this needs user confirmation which would interrupt the build." \ } \ }, \ { \ "name": "install", \ "kwds": { \ "pkgs": [ \ "afni", \ "fsl" \ ], \ "opts": null \ } \ }, \ { \ "name": "run", \ "kwds": { \ "command": "emerge --autounmask-continue \\\\\\n afni \\\\\\n fsl \\\\\\n && rm -rf /var/tmp/portage/*" \ } \ }, \ { \ "name": "copy", \ "kwds": { \ "source": [ \ "environment.yml", \ "/opt/environment.yml" \ ], \ "destination": "/opt/environment.yml" \ } \ }, \ { \ "name": "copy", \ "kwds": { \ "source": [ \ "src", \ "/opt/dsst-defacing-pipeline" \ ], \ "destination": "/opt/dsst-defacing-pipeline" \ } \ }, \ { \ "name": "env", \ "kwds": { \ "CONDA_DIR": "/opt/miniconda-latest", \ "PATH": "/opt/miniconda-latest/bin:$PATH" \ } \ }, \ { \ "name": "run", \ "kwds": { \ "command": "\\n# Install dependencies.\\nexport PATH=\\"/opt/miniconda-latest/bin:$PATH\\"\\necho \\"Downloading Miniconda installer ...\\"\\nconda_installer=\\"/tmp/\\"\\ncurl -fsSL -o \\"$conda_installer\\"\\nbash \\"$conda_installer\\" -b -p /opt/miniconda-latest\\nrm -f \\"$conda_installer\\"\\nconda update -yq -nbase conda\\n# Prefer packages in conda-forge\\nconda config --system --prepend channels conda-forge\\n# Packages in lower-priority channels not considered if a package with the same\\n# name exists in a higher priority channel. Can dramatically speed up installations.\\n# Conda recommends this as a default\\n#\\nconda config --set channel_priority strict\\nconda config --system --set auto_update_conda false\\nconda config --system --set show_channel_urls true\\n# Enable `conda activate`\\nconda init bash\\nconda env create --name dsstdeface --file /opt/environment.yml\\n# Clean up\\nsync && conda clean --all --yes && sync\\nrm -rf ~/.cache/pip/*" \ } \ }, \ { \ "name": "user", \ "kwds": { \ "user": "dsst" \ } \ }, \ { \ "name": "entrypoint", \ "kwds": { \ "args": [ \ "/opt/dsst-defacing-pipeline/" \ ] \ } \ } \ ] \ }' > /.reproenv.json USER dsst # End saving to specification to JSON. ```
Manual changes I had to do to the Dockerfile to make it legit -- so you can see that many COPY command should be pulled out, then gentoo-portage -- should not be COPYied, but rather commands to populate those files since they will not present in that repository ```patch --- /tmp/Dockerfile 2024-05-07 11:39:41.592262915 -0400 +++ Dockerfile 2024-05-07 11:51:53.162785867 -0400 @@ -2,21 +2,19 @@ FROM as portage FROM -RUN COPY --from=portage /var/db/repos/gentoo /var/db/repos/gentoo \ - && RUN emerge -v --noreplace dev-vcs/git \ - && RUN emerge -v1u portage \ - # Pinned commits for the dependency tree state - && ARG gentoo_hash=2d25617a1d085316761b06c17a93ec972f172fc6 \ - && ARG science_hash=73916dd3680ffd92e5bd3d32b262e5d78c86a448 \ - && ARG FEATURES="-ipc-sandbox -network-sandbox -pid-sandbox" \ - # This will be bound, and contents available outside of container - && RUN mkdir /outputs \ - && COPY gentoo-portage/ /etc/portage/ \ - # Moving gentoo repo from default rsync to git - && RUN rm /var/db/repos/gentoo -rf \ - # Cloning manually to prevent vdb update, pinning state via git - # Allegedly it's better to chain everything in one command, something with container layers 🤔 - && RUN \ + +COPY --from=portage /var/db/repos/gentoo /var/db/repos/gentoo +COPY gentoo-portage/ /etc/portage/ + +ARG gentoo_hash=2d25617a1d085316761b06c17a93ec972f172fc6 \ +ARG science_hash=73916dd3680ffd92e5bd3d32b262e5d78c86a448 \ +ARG FEATURES="-ipc-sandbox -network-sandbox -pid-sandbox" \ + +RUN emerge -v --noreplace dev-vcs/git \ + && emerge -v1u portage \ + && mkdir /outputs \ + && rm /var/db/repos/gentoo -rf \ + && \ REPO_URL=$(grep "^sync-uri" /etc/portage/repos.conf/gentoo | sed -e "s/sync-uri *= *//g") && \ mkdir -p /var/db/repos/gentoo && pushd /var/db/repos/gentoo && git init . && \ git remote add origin ${REPO_URL} && \ @@ -52,7 +50,7 @@ PATH="/opt/miniconda-latest/bin:$PATH" RUN \ # Install dependencies. - && export PATH="/opt/miniconda-latest/bin:$PATH" \ + export PATH="/opt/miniconda-latest/bin:$PATH" \ && echo "Downloading Miniconda installer ..." \ && conda_installer="/tmp/" \ && curl -fsSL -o "$conda_installer" \ ```
TheChymera commented 2 months ago
neurodocker generate docker --pkg-manager portage --copy --from=portage /var/db/repos/gentoo /var/db/repos/gentoo --base-image gentoo --gentoo gentoo_hash=2d25617a1d085316761b06c17a93ec972f172fc6 --install afni

This can be used to add the COPY line, but it does so at the beginning of the container file

This adds the COPY line after the RUN line from templates/gentoo.yaml:

neurodocker generate docker --pkg-manager portage --base-image gentoo --gentoo gentoo_hash=2d25617a1d085316761b06c17a93ec972f172fc6 --copy --from=portage /var/db/repos/gentoo /var/db/repos/gentoo --install afni

Not sure how to control the position

yarikoptic commented 2 months ago

this is the command we ended up testing with

neurodocker generate docker --pkg-manager portage --base-image gentoo --gentoo gentoo_hash=2d25617a1d085316761b06c17a93ec972f172fc6 >| Dockerfile && docker build -t test .
TheChymera commented 1 month ago

works with

neurodocker generate docker --pkg-manager portage --base-image gentoo --gentoo gentoo_hash=0e9370b45a589867220384ca6c63bc6bcaec3f74 --install afni >| Containerfile && podman build -t test .
yarikoptic commented 3 weeks ago

any progress here to bring PR into the general usability and "ready for review" state @TheChymera ?