anchore / syft

CLI tool and library for generating a Software Bill of Materials from container images and filesystems
Apache License 2.0
6.07k stars 561 forks source link

Support cataloging NuGet packages #373

Open wagoodman opened 3 years ago

wagoodman commented 3 years ago

It would be useful to catalog NuGet packages. We should consider deriving this information from one or more sources:

Before diving in we should consider if this should be a cataloger for images, directory, or a hybrid of both.

westonsteimel commented 3 years ago

For the deployed artifacts of a .NET Core app I think you typically end up with something like <app_name>.deps.json. This is an example output for TestApp.deps.json

{
  "runtimeTarget": {
    "name": ".NETCoreApp,Version=v5.0",
    "signature": ""
  },
  "compilationOptions": {},
  "targets": {
    ".NETCoreApp,Version=v5.0": {
      "TestApp/1.0.0": {
        "dependencies": {
          "AWSSDK.S3": "3.7.0.10",
          "Newtonsoft.Json": "13.0.1"
        },
        "runtime": {
          "TestApp.dll": {}
        }
      },
      "AWSSDK.Core/3.7.0.10": {
        "runtime": {
          "lib/netcoreapp3.1/AWSSDK.Core.dll": {
            "assemblyVersion": "3.3.0.0",
            "fileVersion": "3.7.0.10"
          }
        }
      },
      "AWSSDK.S3/3.7.0.10": {
        "dependencies": {
          "AWSSDK.Core": "3.7.0.10"
        },
        "runtime": {
          "lib/netcoreapp3.1/AWSSDK.S3.dll": {
            "assemblyVersion": "3.3.0.0",
            "fileVersion": "3.7.0.10"
          }
        }
      },
      "Newtonsoft.Json/13.0.1": {
        "runtime": {
          "lib/netstandard2.0/Newtonsoft.Json.dll": {
            "assemblyVersion": "13.0.0.0",
            "fileVersion": "13.0.1.25517"
          }
        }
      }
    }
  },
  "libraries": {
    "TestApp/1.0.0": {
      "type": "project",
      "serviceable": false,
      "sha512": ""
    },
    "AWSSDK.Core/3.7.0.10": {
      "type": "package",
      "serviceable": true,
      "sha512": "sha512-XIg3tsHLQwN1k/H2M/dUZa2hc+cUDoDJBZpppxX5XJ1vhvI/MhQR9PTBDM8rNenxTNmfLflruU51vamvzrBXeQ==",
      "path": "awssdk.core/3.7.0.10",
      "hashPath": "awssdk.core.3.7.0.10.nupkg.sha512"
    },
    "AWSSDK.S3/3.7.0.10": {
      "type": "package",
      "serviceable": true,
      "sha512": "sha512-pkaBimX5l8uJfRU1wEv0JA2JFQ6IQX3y4PsL90t1QXiAtel/EZaXUhV9rN6udQPw3pI5qetjTty30apaaZnHTg==",
      "path": "awssdk.s3/3.7.0.10",
      "hashPath": "awssdk.s3.3.7.0.10.nupkg.sha512"
    },
    "Newtonsoft.Json/13.0.1": {
      "type": "package",
      "serviceable": true,
      "sha512": "sha512-ppPFpBcvxdsfUonNcvITKqLl3bqxWbDCZIzDWHzjpdAHRFfZe0Dw9HmA0+za13IdyrgJwpkDTDA9fHaxOrt20A==",
      "path": "newtonsoft.json/13.0.1",
      "hashPath": "newtonsoft.json.13.0.1.nupkg.sha512"
    }
  }
}
sophiewigmore commented 2 years ago

Hi there! I'm interested in this feature (particularly for a directory). Are there any updates on this issue or findings so far?

luhring commented 2 years ago

Hi @sophiewigmore! No updates that I'm aware of, but we'd love to support NuGet, if anyone's interested in submitting a PR 😃

westonsteimel commented 2 years ago

Also, if it's helpful the NuGet scanning currently implemented in aquasecurity/trivy is at https://github.com/aquasecurity/go-dep-parser/tree/main/pkg%2Fnuget. Also, it's worth considering that the .net core dependencies spec went through several iterations in earlier versions so may want to look at supporting those (as I know for certain organisations still use them :)). And .net framework is likely slightly different from all of these.

westonsteimel commented 2 years ago

I was hoping to get a chance to look at this one during my break, but so far things haven't really gone to plan. Still a possibility I might get to it though.

westonsteimel commented 2 years ago

Another thought I had here is that it would be great to eventually support somehow extracting the version information from the compiled DLLs themselves. I believe that information is persisted in the binaries, but I haven't looked up the specific spec for it yet. Also this will likely get some weird stuff since the assembly version and file version don't always align with the NuGet published version, etc

westonsteimel commented 2 years ago

Also, should we consider eventually having two separate cataloger packages here, one dotnet for the modern cross-platform version (also sometimes referred to as .NET Core) and a separate package dotnetframework for the older mostly windows-only version stuff? I think focusing on the dotnet *.deps.json which is produced by dotnet publish is what I'd recommend focusing the initial effort on. I'll try to build some sample project output for various versions of .net when I (hopefully) have some time tonight.

zhill commented 2 years ago

I think we should distinguish between .NET/CLR support and NuGet support. A NuGet cataloger should look at the *.nuspec' file: https://docs.microsoft.com/en-us/nuget/reference/nuspec and as you mentioned the *.deps.json to get information on packages installed. I don't think the *.deps.json is NuGet specific, so we can discuss where/how that should added and if there is one or more catalogers.

This is probably already clear, but just to make sure: Syft should be able to catalog both the not-yet-built declared dependencies as well as the installed application (no source available).

General .NET code support is a different matter since there isn't necessarily a package manager for metadata. I think https://github.com/anchore/syft/issues/726 tracks the general .NET support and will require different approaches.

macsux commented 2 years ago

nuspec files are only used for generating nuget packages, which are predominantly libraries and are not executable artifacts on their own. When we talk about docker images, it's really .net apps that consume nuget packages, but are themselves are not. They would not have a nuspec file for them. An executable artifact that may be packed into a docker image will usually come bundled with deps.json file which will contain all the nuget packages that it references.

macsux commented 2 years ago

@westonsteimel

Also, should we consider eventually having two separate cataloger packages here, one dotnet for the modern cross-platform version (also sometimes referred to as .NET Core) and a separate package dotnetframework for the older mostly windows-only version stuff

There is no good way to create a BOM for a .net framework app without access to sources that were used to build it. There's a packages.config XML file that is present usually at the solution level of the original source, which guides which packages should be restored before build starts, but there's nothing added to the final output (hence not present in the container) that would help you uncover this. Even if you had access to packages.config, the relationship isn't obvious because it usually contains packages for the entire solution. It's role is to populate packages dir at solution level with restored packages (which acts just like local cache equivalent to .m2 folder in java, but stored at project level), and then each project links to some dll it expects to be present in that folder during the build. Unlike .NET core, the package dependencies are sorta indirect, as I can totally have stuff in package.config but not used by any projects during the build.

macsux commented 2 years ago

Another thought I had here is that it would be great to eventually support somehow extracting the version information from the compiled DLLs themselves. I believe that information is persisted in the binaries, but I haven't looked up the specific spec for it yet. Also, this will likely get some weird stuff since the assembly version and file version don't always align with the NuGet published version, etc

@westonsteimel Not gonna work because they don't necessarily align, and in many cases may not even be stamped on the DLLs. Only if the author decided to pass them into the compilation process. On top of that the versioning of nuget and assembly differs. Assemblies follow semver, while nuget has it's own convention that may include tags to signify prerelease builds. Ex. A package such as Pomelo.Marked nuget 2.0.0-rtm-10044 cannot be stamped on to the assembly, as 2.0.0-rtm-10044 is not a valid semver. It's also entirely possible for assembly inside the package to be of a completely different name than of the package it came from.

plaisted commented 1 year ago

Please note the .deps.json also does not always provide the correct version of the assemblies / dependencies. Under some circumstances the lowest compatible assembly version is included in .deps.json which triggers false positives if using that for vuln scanning. One scenario is publishing with "--self-contained false", under certain circumstances it will add very old versions of dependencies that are not included in the deployment but provided by the runtime (eg. the runtime version actually used by app is 7.0.0, but deps.json says 4.3.0).

luhring commented 1 year ago

Looks related to the problem @plaisted points out: #1799