anchore / anchore-engine

A service that analyzes docker images and scans for vulnerabilities
Apache License 2.0
1.59k stars 272 forks source link

Feature Request: Show OS package origin information #148

Open juledwar opened 5 years ago

juledwar commented 5 years ago

When retrieving OS content for an image, we currently have the following fields returned: (an actual example from one of my images)

        {
            "license": "PD probably-PD GPL-2+ LGPL-2.1+ permissive-fsf Autoconf GPL-2 none permissive-nowarranty config-h noderivs PD-debian", 
            "origin": "Jonathan Nieder <jrnieder@gmail.com> (maintainer)", 
            "package": "xz-utils", 
            "size": "516000", 
            "type": "dpkg", 
            "version": "5.2.2-1.2+b1"
        }, 

We would also like to see information about the repository from which the package was installed. For example, in Debian packages you can run apt-cache policy xz-utils and you get:

$ apt-cache policy xz-utils
xz-utils:
  Installed: 5.2.2-1.3
  Candidate: 5.2.2-1.3
  Version table:
 *** 5.2.2-1.3 500
        500 http://nova.clouds.archive.ubuntu.com/ubuntu bionic/main amd64 Packages
        100 /var/lib/dpkg/status

It should be fairly trivial to parse the URL out of this output. Other OSes will have similar functionality.

All we need for now is for this information to be returned in the content data, for example:

        {
            "license": "PD probably-PD GPL-2+ LGPL-2.1+ permissive-fsf Autoconf GPL-2 none permissive-nowarranty config-h noderivs PD-debian", 
            "origin": "Jonathan Nieder <jrnieder@gmail.com> (maintainer)", 
            "package": "xz-utils", 
            "size": "516000", 
            "type": "dpkg", 
            "version": "5.2.2-1.2+b1",
            "location": "http://nova.clouds.archive.ubuntu.com/ubuntu bionic/main"
        }, 

I would have called it origin, but that key is already used!

juledwar commented 5 years ago

@zhill @nurmi This is the ticket I said I'd file. Happy to discuss implementation details here. Many thanks.

zhill commented 5 years ago

The apt-cache policy call just shows the current mapping of the repos to the installed packages, not the actual source from which the package was installed. Is that ok for your use-case? E.g. a user installs from repoA and edits /etc/apt/sources.list to be regular ubuntu spots, re-runs apt update and those new repos show as the install source in apt-cache policy output. Most package managers don't seem to actually track this data just the current download locations based on current config. If that is understood and ok then this is an achievable request, otherwise there is a deeper analysis required (per-layer).

juledwar commented 5 years ago

This is fine @zhill . It does depend on the images retaining the package metadata as well, so it's probably going to be a very conscious decision on the part of image creators.

Thanks!