dependabot / dependabot-core

🤖 Dependabot's core logic for creating update PRs.
https://docs.github.com/en/code-security/dependabot
MIT License
4.74k stars 1.03k forks source link

Add support for OS package manager like apt and apk in Dockerfiles #2129

Open ferrarimarco opened 5 years ago

ferrarimarco commented 5 years ago

While your docker support is great to keep the FROM directive updated, it could be enhanced by including support for the OS package managers (like APT for Debian and derivatives, APK for Alpine...).

With this addition we could completely rely on dependabot to keep our Docker images updated, instead of having to keep that manually updated.

Example 1 (APT on Ubuntu):

apt-get install apache2=2.2.20-1ubuntu1 \
                     apache2.2-common=2.2.20-1ubuntu1 \
                     apache2.2-bin=2.2.20-1ubuntu1 \
                     apache2-mpm-worker=2.2.20-1ubuntu1

Example 2 (APK on Alpine):

apk add packagename=1.2.3-suffix

We could even get fancy and support version constraints, like >=.

hmarr commented 5 years ago

I'm really keen to support this too - we've got several cases internally that'd benefit from this so we've wanted it for a while, too!

It wouldn't be super hard to implement, but it's not a tiny project either. We'd need to parse shell expressions so we could handle things like RUN apt-get update && apt-get install -y foo=1.0.0 (far more complicated examples exist too...!), and we'd need to integrate with the various package registries, ideally detecting the distro and release by looking at the base image (recursively).

Unfortunately we won't have capacity to implement this in the near future. If you're really keen for support, we would accept a PR though :-)

ferrarimarco commented 5 years ago

The repo is this one, right? https://github.com/dependabot/dependabot-core

greysteil commented 5 years ago

Yep!

jeff-cook commented 5 years ago

This would be very helpful!

Would it be easier to code the update if we used files like pip and gem? For example an akp.txt file. Then use something like xargs to run apk add. That way they don't have to figure out how to parse the packages out of the Dockerfile.

ferrarimarco commented 5 years ago

I suppose the parser implementation complexity would be the same, but you'll have the overhead of having to load that file somehow when you build the docker image (since you have to install those packages, don't you?)

mst-ableton commented 5 years ago

We ended up making a quick Python script to roll the pins: https://gist.github.com/mst-ableton/d0b80692571718fcb0a8f3984add9c03. As it uses Python it's not easily upstreamable, but the idea is to run apt-get update inside the container and parse the output of apt-get upgrade -s to see what it would have upgraded to. Because it's doing two docker builds, it may take a while to run. Hope this effort can jumpstart a Dependabot-native implementation in the future.

hazcod commented 5 years ago

Been bashing my head against the wall with this one for https://github.com/ironPeakServices/iron-redis At one side you want to pin your package versions, but the other way you can't keep maintaining the package versions manually or whenever there is a security fix.

CpuID commented 3 years ago

I've taken a look at what would be involved to make this a reality - I almost started a standalone project to do it, but having it part of Dependabot feels more appropriate, plus there's a better code structure already.

Questions/thoughts for any dependabot-core maintainers (@feelepxyz @jurre @greysteil ?):

  1. I can see value in reusing the Docker FileFetcher, but having a separate package_manager used for different base OS'es, a la:

    • docker_alpine
    • docker_ubuntu
    • docker_centos
    • etc etc how would you prefer the file hierarchy to look here? extra top level directories for each package_manager respectively? or subdirectories within top level docker? maybe docker/lib/dependabot/docker_(alpine|ubuntu|centos)?
  2. There might be some potential for shared/reusable logic in the various FileParser's and FileUpdater's, maybe even a single shared FileParser/FileUpdater, TBD.

  3. I think each UpdateChecker will likely be unique, to talk to the different package repositories respectively for each OS. Things like Ubuntu PPA's and the equivalents for other OS'es will be interesting to deal with also... as these cannot be 100% inferred from the contents of the dependency file (Dockerfile) only?

  4. I can see a potential need to actually "run" the Docker image with a command to trawl/read the likes of:

    • /etc/os-release
    • /etc/apt*
    • /etc/yum* for "what package repositories need to be poked by the UpdateChecker, is there a facility available to do that?
CpuID commented 3 years ago

At one side you want to pin your package versions, but the other way you can't keep maintaining the package versions manually or whenever there is a security fix.

@hazcod I think the core principal here from my standpoint, is for a Dockerfile to produce a deterministic image output. It's tricky, but hard versioning at the OS package level goes a long way towards that working (with the exception of a Linux distribution pulling the rug out from under you and 404'ing the repo URLs for a specific OS release).

CpuID commented 3 years ago

3. I can see a potential need to actually "run" the Docker image with a command to trawl/read the likes of:

  • /etc/os-release
  • /etc/apt*
  • /etc/yum* for "what package repositories need to be poked by the UpdateChecker, is there a facility available to do that?

https://github.com/dependabot/dependabot-core#setup

To run all of Dependabot Core, you'll need Ruby, Python, PHP, Elixir, Node, Go, Elm, and Rust installed.

No current provision to have Docker installed or accessible as part of the list of helpers specified?

greysteil commented 3 years ago

I don't maintain Dependabot anymore, but you're in safe hands with @feelepxyz and @jurre. I know they've been swamped in the last few weeks, though, and may be taking some well deserved time off over Christmas.

jurre commented 3 years ago

Appreciate you looking into this @CpuID. I just want to preface this with a note that I'm not sure if we will be able to timely review, merge and support such a contribution at this time.

We've paused accepting new ecosystems, and this patch might be of similar proportions.

Having said that, I'll try to answer some of your questions:

how would you prefer the file hierarchy to look here? extra top level directories for each packagemanager respectively? or subdirectories within top level docker? maybe docker/lib/dependabot/docker(alpine|ubuntu|centos)

I imagine that the implementations will be relatively similar, and it feels like it should be part of the docker package_manager.

What I imagine right now (without much context on this, so I may very well be wrong):

It's hard to say what it should look like exactly without doing some more investigation though, and I would definitely re-evaluate once we have a better idea of how many parts of the codebase we can reuse and how much we end up having to change.

CpuID commented 3 years ago

@jurre thanks for the response :)

I think your suggestion for using docker/lib/dependabot/docker/update_checkers/alpine_update_checker.rb etc makes sense, I'm happy with that filename hierarchy (depending on which class is sharded out respectively, TBD based on findings during implementation). Eg. could be docker/lib/dependabot/docker/file_parsers/alpine_file_parser.rb.

I'll see if I get free cycles to put something together, and see how far I get.

We aim to provide the best user experience possible for each of these, but we have found we've lacked the capacity – and in some cases the in-house expertise – to support new ecosystems in the last year.

@jurre hiring? :)

jurre commented 3 years ago

@jurre hiring? :)

We are! https://boards.greenhouse.io/github/jobs/2383025 https://boards.greenhouse.io/github/jobs/2384868

cecilemuller commented 3 years ago

If dependencies were stored in a JSON file similar to package.json, jq and xargs can be used to generate the install command and update the versions:

apt.json

{
  "nginx": "1.18.0-0ubuntu1",
  "openssl": "1.1.1f-1ubuntu2.4",
  "ca-certificates": "20210119~20.04.1"
}

Run in Dockerfile:

jq -r 'to_entries | .[] | .key + "=" + .value' apt.json | xargs apt-get install -y

An action can read the version to update the JSON:

apt-cache policy nginx | grep -oP '(?<=Candidate:\s)(.+)'
cecilemuller commented 3 years ago

Here's a working example.

A script updates the latest version of packages in the JSON file: https://github.com/wildpeaks/docker-nginx/blob/main/docker/update_dependencies.sh

#!/bin/bash

JSON=$( cat dependencies.json )

for PACKAGE in $( echo $JSON | jq -r 'keys | .[]' ); do
    VERSION=$( apt-cache policy "$PACKAGE" | grep -oP '(?<=Candidate:\s)(.+)' )
    JSON=$( echo $JSON | jq '.[$package] = $version' --arg package $PACKAGE --arg version $VERSION )
done

echo $JSON | python -m json.tool > dependencies.json

A cron Action runs the update script and creates a matching pull request: https://github.com/wildpeaks/docker-nginx/blob/main/.github/workflows/dependencies.yml

# ...
    - name: Update dependencies
      working-directory: docker
      run: |
        sudo apt-get update
        sh update_dependencies.sh

    - name: Create PR
      uses: peter-evans/create-pull-request@v3
      with:
        commit-message: "chore(deps): update dependencies.json"
        branch: features/update-dependencies
        title: Update APT packages
        body: Updated dependencies.json
        delete-branch: true

And the Dockerfile uses the JSON file to install pinned versions: https://github.com/wildpeaks/docker-nginx/blob/main/docker/Dockerfile#L7

# ...
COPY dependencies.json /tmp/dependencies.json
RUN DEBIAN_FRONTEND=noninteractive apt-get update \
 && apt-get install -y --no-install-recommends jq \
 && jq -r 'to_entries | .[] | .key + "=" + .value' /tmp/dependencies.json | xargs apt-get install -y --no-install-recommends \
 && rm /tmp/dependencies.json 
# ...
billy1kaplan commented 3 years ago

Hello! I was wondering if there was any ongoing effort or plan to get this implemented. This feature would be a huge help!

jsirianni commented 3 years ago

This would be useful for me. I had an issue where the Ubuntu repositories gave me a very old version of a package. Ive started pinning my package versions, but now I have increased maintenance overhead.

modem7 commented 3 years ago

Here's a working example.

A script updates the latest version of packages in the JSON file: https://github.com/wildpeaks/docker-nginx/blob/main/docker/update_dependencies.sh

#!/bin/bash

JSON=$( cat dependencies.json )

for PACKAGE in $( echo $JSON | jq -r 'keys | .[]' ); do
  VERSION=$( apt-cache policy "$PACKAGE" | grep -oP '(?<=Candidate:\s)(.+)' )
  JSON=$( echo $JSON | jq '.[$package] = $version' --arg package $PACKAGE --arg version $VERSION )
done

echo $JSON | python -m json.tool > dependencies.json

A cron Action runs the update script and creates a matching pull request: https://github.com/wildpeaks/docker-nginx/blob/main/.github/workflows/dependencies.yml

# ...
    - name: Update dependencies
      working-directory: docker
      run: |
        sudo apt-get update
        sh update_dependencies.sh

    - name: Create PR
      uses: peter-evans/create-pull-request@v3
      with:
        commit-message: "chore(deps): update dependencies.json"
        branch: features/update-dependencies
        title: Update APT packages
        body: Updated dependencies.json
        delete-branch: true

And the Dockerfile uses the JSON file to install pinned versions: https://github.com/wildpeaks/docker-nginx/blob/main/docker/Dockerfile#L7

# ...
COPY dependencies.json /tmp/dependencies.json
RUN DEBIAN_FRONTEND=noninteractive apt-get update \
 && apt-get install -y --no-install-recommends jq \
 && jq -r 'to_entries | .[] | .key + "=" + .value' /tmp/dependencies.json | xargs apt-get install -y --no-install-recommends \
 && rm /tmp/dependencies.json 
# ...

That works fantastically! Thank you!

Did you ever figure out how to get it to work on non-Ubuntu? E.g. Alpine Docker builds? I saw that you have the repo docker-browser-sync without the dependency updates action.

cecilemuller commented 3 years ago

That works fantastically! Thank you!

Glad it helps :)

Did you ever figure out how to get it to work on non-Ubuntu? E.g. Alpine Docker builds? I saw that you have the repo docker-browser-sync without the dependency updates action.

The browser-sync one didn't need it because it's a NPM dependency (so Dependabot is the one updating the JSON file).

As for Alpine, sorry I never tried but the main challenge would be to find an Alpine equivalent of the apt-cache policy command whereas the rest should be similar (afaik jq is also available on Alpine).

lordvandal commented 3 years ago

Actually there is a similar command to apt-cache policy in Alpine. It's possible to list the upgradable packages, with new and current versions with apk -u list.

Update indexes first:

# apk update

fetch http://dl-cdn.alpinelinux.org/alpine/v3.14/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.14/community/x86_64/APKINDEX.tar.gz
v3.14.0-160-g18a21f8aa5 [http://dl-cdn.alpinelinux.org/alpine/v3.14/main]
v3.14.0-165-g01e8bc9b28 [http://dl-cdn.alpinelinux.org/alpine/v3.14/community]
OK: 15009 distinct packages available

Then we can list only upgradable packages:

# apk -u list

rsync-3.2.3-r4 x86_64 {rsync} (GPL-3.0-or-later) [upgradable from: rsync-3.2.3-r2]
rsync-doc-3.2.3-r4 x86_64 {rsync} (GPL-3.0-or-later) [upgradable from: rsync-doc-3.2.3-r2]
rsync-openrc-3.2.3-r4 x86_64 {rsync} (GPL-3.0-or-later) [upgradable from: rsync-openrc-3.2.3-r2]
krb5-libs-1.18.4-r0 x86_64 {krb5} (MIT) [upgradable from: krb5-libs-1.18.3-r1]
libcurl-7.78.0-r0 x86_64 {curl} (MIT) [upgradable from: libcurl-7.77.0-r1]
apk-tools-doc-2.12.6-r0 x86_64 {apk-tools} (GPL-2.0-only) [upgradable from: apk-tools-doc-2.12.5-r1]
apk-tools-2.12.6-r0 x86_64 {apk-tools} (GPL-2.0-only) [upgradable from: apk-tools-2.12.5-r1]
linux-virt-5.10.43-r0 x86_64 {linux-lts} (GPL-2.0) [upgradable from: linux-virt-5.4.84-r0]
curl-7.78.0-r0 x86_64 {curl} (MIT) [upgradable from: curl-7.77.0-r1]
curl-doc-7.78.0-r0 x86_64 {curl} (MIT) [upgradable from: curl-doc-7.77.0-r1]

I've never used jq, so a bit of help would be greatly appreciated :stuck_out_tongue_winking_eye:

cecilemuller commented 3 years ago

The convenient thing with apt-cache policy is that it provides the version number without having to install the outdated packages first (unlike a list of upgradable packages).

I think this would be a closer equivalent: apk info "PACKAGENAME" | head -1 | cut -d ' ' -f 1

BigmenPixel0 commented 2 years ago

Any news?

danepowell commented 2 years ago

This sounds great in theory, but for the vast majority of use cases it's probably a false hope. Why? Debian repos only maintain the latest version of a given package. Unless you are hosting your own package repo, you aren't going to be able to install arbitrary package versions. So the idea of committing a "dependencies.json" file to version control is essentially impossible, at least in the context of building Docker images.

The only exceptions I see are if you host your own package repo or rely on very careful Docker caching to retain an old "pinned" version of a package.

Am I missing something?

jsirianni commented 2 years ago

This sounds great in theory, but for the vast majority of use cases it's probably a false hope. Why? Debian repos only maintain the latest version of a given package. Unless you are hosting your own package repo, you aren't going to be able to install arbitrary package versions. So the idea of committing a "dependencies.json" file to version control is essentially impossible, at least in the context of building Docker images.

The only exceptions I see are if you host your own package repo or rely on very careful Docker caching to retain an old "pinned" version of a package.

Am I missing something?

The repos can contain a single version sometimes, but not always. You seem to be correct, at least for debian:latest and ubuntu:latest, but a quick check shows that this is not always the case.

debian:10 image

root@d71a1f0c3573:/# apt-cache madison systemd
   systemd | 241-7~deb10u8 | http://deb.debian.org/debian buster/main amd64 Packages
   systemd | 241-7~deb10u8 | http://security.debian.org/debian-security buster/updates/main amd64 Packages

ubuntu:20.04 image

root@edf9514d8882:/# apt-cache madison systemd
   systemd | 245.4-4ubuntu3.16 | http://archive.ubuntu.com/ubuntu focal-updates/main amd64 Packages
   systemd | 245.4-4ubuntu3.15 | http://security.ubuntu.com/ubuntu focal-security/main amd64 Packages
   systemd | 245.4-4ubuntu3 | http://archive.ubuntu.com/ubuntu focal/main amd64 Packages

This is just an example. I have observed the Ubuntu repos giving me a very old package version temporarily, causing one of my images to fail integration tests due to the incompatible package. This has happened only once. I solved this by pinning the version in my dockerfile, however, maintaining the dockerfile becomes difficult. The trade off is worth it for me, but maybe not for everyone.

Additionally, some Dockerfile linters will push you to pin package versions. If you allow the repo's to decide which package version you are using, you do loose some control of your image's end state.

I agree that it is generally not an issue, but it can be for some builds.

I am not happy with my solution, but it does work. It is based on some of the feedback in this thread. Basically, I use a "base image" that has the pinned packages that I depend on. This way, I end up building the base image infrequently while building my final image frequently. Dependabot would be a great addition to this workflow, preventing my base image from going stale.

Lastly, its possible that excellent integration testing of the final image would allow us to always use the latest base image with the latest packages, without relying on dependabot to handle things. Just depends how folks wish to do things.

danepowell commented 2 years ago

Thanks, that essentially confirms my intuition. Even in the examples you provided, the different package versions are due to them being in different repos, but each repo only contains a single version.

In an ideal world, I think we'd all pin our Apt package versions, but that seems incompatible with the Debian / apt ecosystem which focuses more on preserving backwards compatibility and thus (theoretically) makes pinning unnecessary.

I'm sure there are other use cases for this feature, this is just mine 😄

ArwynFr commented 2 years ago

This sounds great in theory, but for the vast majority of use cases it's probably a false hope. Why? Debian repos only maintain the latest version of a given package. Unless you are hosting your own package repo, you aren't going to be able to install arbitrary package versions. So the idea of committing a "dependencies.json" file to version control is essentially impossible, at least in the context of building Docker images.

The only exceptions I see are if you host your own package repo or rely on very careful Docker caching to retain an old "pinned" version of a package.

Am I missing something?

I think you are taking the problem by the wrong end.

Of course there are applications whose maintainers use version pinning to build software based on legacy dependencies. I don't think such people find much value in using dependabot, they know which version of each dependency they use and they know they can hardly upgrade without breaking everything. The target of Dependabot are software maintainers that want to efficiently keep their software up to date with the latest security fixes.

Take this Dockerfile as an example:

FROM debian:11.4-slim as minifier
RUN apt-get install --yes --no-install-recommends minify=2.7.2-1+b6

Imagine minify developers make a security fix and distribute a newer 2.7.3 version that makes its way into the debian repository. I don't get a Dependabot notification regarding the deprecation of 2.7.2. I either handle the upgrade manually, which is insane when you consider the number of projects times number of dependencies I have to monitor. Or you use some latest-like constraint and build software periodically. This is so inefficient: most of the builds will result in no change compared to the previous build, and you can expect an half-period between the new version being available and deployed.

Thanks to Dependabot, whenever debian publishes the next version of their base image, I'll get a notification prompting me to upgrade my base image to debian:11.5-slim. This allows me to immediately build a new image of my software, based on that new image, without spilling computing resources to rebuild my image daily / weekly for nothing.

I wish I had the same feature for my apt packages.

asbjornu commented 2 years ago

Thank you for so succinctly describing my precise use case, @ArwynFr. Yep, this is exactly how I want Dependabot to work.

modem7 commented 2 years ago

Thanks to Dependabot, whenever debian publishes the next version of their base image, I'll get a notification prompting me to upgrade my base image to debian:11.5-slim. This allows me to immediately build a new image of my software, based on that new image, without spilling computing resources to rebuild my image daily / weekly for nothing.

I wish I has the same feature for my apt packages.

It also helps us keep track of when specific packages were updated, helping troubleshooting be far faster.