anchore / syft

CLI tool and library for generating a Software Bill of Materials from container images and filesystems
Apache License 2.0
6.07k stars 561 forks source link

Map shared lib / executable dependencies #661

Open wagoodman opened 2 years ago

wagoodman commented 2 years ago

What would you like to be added: The ability to list the specific shared lib dependencies for a binary. For example:

$ readelf -d ./partx

Dynamic section at offset 0x1c908 contains 29 entries:
  Tag        Type                         Name/Value
 0x0000000000000001 (NEEDED)             Shared library: [libblkid.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libsmartcols.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
 0x000000000000000c (INIT)               0x4000
 0x000000000000000d (FINI)               0x15304
 0x0000000000000019 (INIT_ARRAY)         0x1d530
...

We could specifically note that there are shared lib dependencies: libblkid.so.1, libsmartcols.so.1, and libc.so.6

These files could be cross correlated with other packages that provide these files to discover relationships between packages and files (or files and files if there is no representative packaging metadata available). We can additionally discover other shared libs and do the same analysis, in which case we can build a tree of dependencies for executables.

The missing part of this is being able to reconcile runtime attributes that may change the structure of the tree (such as LD_LIBRARY_PATH). However, as long as the files are present the superset of dependencies can be created without issue (no need to consider these runtime constraints).

There is a lot more thought needed here; does this imply a separate binary cataloger? are those findings considered packages? if they are not (thus considered only files) how do you keep extra binary format info around? or do we only focus on creating relationships between files? does this overlap with the golang bin cataloger or is it in a separate-enough of a domain?

jonasagx commented 2 years ago

Possibly helpful: https://github.com/sad0p/go-readelf

wagoodman commented 2 years ago

The go stdlib already has the capability of listing out shared libs from all formats we'd be interested in supporting (including elf)

jonasagx commented 2 years ago

From OSS meeting:

We should consider when to catalog these based on the source being scanned (maybe images and dir only? Maybe not individual files?)

mythi commented 1 year ago

The ability to list the specific shared lib dependencies for a binary. For example:

I can see this issue is bit old but the feature would be greatly useful for my use-case so +1 for the idea.

wagoodman commented 7 months ago

I feel this can manifest a few different ways, but I want to put forth my take on how this could be expressed.

The odd thing about this kind of feature is that we are relating things that are essentially files and not necessarily packages. That is, it might be that a binary we find with shared lib deps is part of a higher-level RPM, or maybe not. The same can be said of the shared libs it's using. If we go the direction of blindly adding relationships for all file nodes in the SBOM that represent executables to other file nodes (other binary files) then it will logically be duplicating any similar package-to-package representations if both executables already happen to be packaged as RPMs (and the cross-package dependency is already captured as a relationship). I think this is is a little conflicting since:

As mentioned earlier, we don't technically know what the loader will do at runtime since we don't have all of the information that the loader would have (such as LD_LIBRARY_PATH). I also don't think we should try an replicate the linker behavior even if we had enough information to do so. This somewhat devalues the file-to-file relationships as a way to convey shared libs. Does it invalidate it? I don't think so, more just de-emphasizes the need to exhaustively enumerate binary-to-binary relationships.

I feel the right user-facing perspective is to try and convey any additional dependency information that is not already present in the existing package-to-package relationships. In that spirit, here are more specific thoughts:

The output of this is a graph where you could traverse runtime dependencies in one connection, instead of needing to traverse in a package-sense first then a file-sense second after looking at attributes of the package/file nodes and determining the need to traverse to another node that doesn't have an edge. I think this would make understanding dependencies more transparent and easier for end users over other approaches.

The downside with this approach is that end users that are doing graph traversal will need to understand that dependences can be either a package or a file, which might be surprising. We do have precedence for this in other relationships contexts in the graph already (e.g. this package owns this files).

wagoodman commented 4 months ago

@mythi we have implemented some of this in a couple of ways:

The first: https://github.com/anchore/syft/pull/2626 which added enumerations of binary imports, exports, and an indication if there is an entrypoint:

$ syft alpine:latest -o json | jq '.files[] | select(.executable != null)'
{
  "id": "ff9969c3449b1e27",
  "location": {
    "path": "/sbin/apk",
    "layerID": "sha256:d4fc045c9e3a848011de66f34b81f052d4f2c15a17bb196d637e526349601820"
  },
  "metadata": {
    "mode": 755,
    "type": "RegularFile",
    "userID": 0,
    "groupID": 0,
    "mimeType": "application/x-sharedlib",
    "size": 69648
  },
  "digests": [ ... ],
  "executable": {
    "format": "elf",
    "hasExports": true,
    "hasEntrypoint": true,
    "importedLibraries": [
      "libcrypto.so.3",
      "libz.so.1",
      "libapk.so.2.14.0",
      "libc.musl-x86_64.so.1"
    ],
    "elfSecurityFeatures": { ...  }
  }
}

This doesn't create any relationships between binaries at all, or raises them to the level of packages (they are under only the "files" section), but it is something.

The second enhancement is around https://github.com/anchore/syft/pull/2396 and https://github.com/anchore/syft/pull/2715 which looks for indications of ELF notes embedded in the binary that indicate package information. This elevates individual binaries or groups of binaries as packages and additionally creates relationships between those new ELF packages and other existing packages and files based on binary imports and exports.

@mythi I'm curious based on your +1 does this fits your needs? or are you looking for additional / different information?

mythi commented 4 months ago

@wagoodman thanks for the follow-up! I need to find the time to give it a try. At a quick glance it looks exactly what I was thinking but I'd have to test it out.

My use-case is rather special but syft is great match for it (thanks!): I have a custom template which spits out Gramine-SGX trusted files TOML tables for individual files in a container image. With this, I should be able to enhance the template to skip unnecessary image files and only add table entries for the main app executable and its library deps.