CycloneDX / cdxgen

Creates CycloneDX Bill of Materials (BOM) for your projects from source and container images. Supports many languages and package managers. Integrate in your CI/CD pipeline with automatic submission to Dependency Track server.
https://cyclonedx.github.io/cdxgen/
Apache License 2.0
550 stars 158 forks source link

Add Support for Scanning an esbuild Metafile #1169

Open pcbowers opened 3 months ago

pcbowers commented 3 months ago

esbuild has support for generating a metafile, which is a file that contains metadata about the build, including what files are pulled in from node_modules. For more information about these metafiles, see https://esbuild.github.io/api/#metafile.

Here is an example of what a metafile looks like (or at least, a very simple one):

{
  "inputs": {
    "node_modules/.pnpm/react@18.3.1/node_modules/react/cjs/react.development.js": {
      "bytes": 87593,
      "imports": [],
      "format": "cjs"
    },
    "node_modules/.pnpm/react@18.3.1/node_modules/react/index.js": {
      "bytes": 190,
      "imports": [
        {
          "path": "node_modules/.pnpm/react@18.3.1/node_modules/react/cjs/react.development.js",
          "kind": "require-call",
          "original": "./cjs/react.development.js"
        }
      ],
      "format": "cjs"
    },
    "app.js": {
      "bytes": 27,
      "imports": [],
      "format": "esm"
    },
    "index.js": {
      "bytes": 89,
      "imports": [
        {
          "path": "node_modules/.pnpm/react@18.3.1/node_modules/react/index.js",
          "kind": "import-statement",
          "original": "react"
        },
        {
          "path": "app.js",
          "kind": "import-statement",
          "original": "./app"
        }
      ],
      "format": "esm"
    }
  },
  "outputs": {
    "out.js": {
      "imports": [],
      "exports": [],
      "entryPoint": "index.js",
      "inputs": {
        "node_modules/.pnpm/react@18.3.1/node_modules/react/cjs/react.development.js": {
          "bytesInOutput": 80842
        },
        "node_modules/.pnpm/react@18.3.1/node_modules/react/index.js": {
          "bytesInOutput": 279
        },
        "index.js": {
          "bytesInOutput": 80
        },
        "app.js": {
          "bytesInOutput": 20
        }
      },
      "bytes": 83118
    }
  }
}

This was generated using the command esbuild index.js --bundle --metafile=stats.json --outfile=out.js with a really simple file setup:

// app.js
export const name = 'bob';

// index.js
import React from 'react';
import { name } from './app';

console.info(`Hello ${name}`);

(Note: a normal build would probably recognize that React isn't being used, but in this case, I didn't include that option on the esbuild command just to illustrate what node_module dependencies look like as compared to in-app dependencies)

Being able to scan this file as opposed to using babel or a lockfile would be great since it contains the definitive bundle output making for a more reliable scan, especially for frameworks like Angular that rely on esbuild and can generate an esbuild Metafile automatically.

prabhu commented 3 months ago

@pcbowers Would you be interested in contributing this feature? Shall we meet on zoom sometime to discuss this?

pcbowers commented 2 months ago

@prabhu I would be, but I'm not entirely sure if I'll have the time to tackle it. The way I see it, the following tasks would need to be completed

However, it gets more complicated than that. Unfortunately, there's no designated location for an esbuild metafile. To generate it, you have to pass the --metafile option, and it supports any file name with any directory (creating the directory if it doesn't exist already). For instance, both commands below generate metafiles, but in vastly different locations with different file names:

esbuild app.js --bundle --metafile=meta.json --outfile=out.js
esbuild app.js --bundle --metafile=asdf/qwer.zxcv --outfile=out.js

Because of this, I think it probably becomes necessary for the tool to support some option to specify an esbuild file to scan. The file itself has no distinguishing characteristics like a version or type field. This becomes tricky because we don't want to bloat the API/options: if we support webpack or other build tools in the feature that have similar build metafiles, do we just go on adding options? Probably need to think about the design for this option pretty carefully to ensure it can support listing multiple files and makes adding other build tools in the feature relatively simple.

Another reality: Using esbuild metafile scanning should simulate how babel works (i.e. creating imports rather than listing dependencies necessarily) since it will theoretically tell you exactly which imports were used. The esbuild file does not include information about transitive build dependencies unless the transitive build dependency actually imports from its node_modules rather than bundling the dependency code itself. Using babel when scanning works similarly and this isn't necessarily a bad thing, but it does mean that, if you just want required dependencies, --required-only will actually filter out transitive dependencies since that's how babel is currently implemented (since only imported items are marked as required). The question is: Should that be changed too? Should the addEvidenceForImports walk down the tree of required packages, setting them to be required as well?

So I think on top of the two tasks I mentioned earlier, there's probably these as well:

That last option probably can be its own issue, but something I did think about when it came to implementing this. Happy to discuss design ramifications and helping break this out more. I'm not sure I've got the time presently to tackle this in its entirety though (things may change, but for now).

prabhu commented 2 months ago

@pcbowers, cdxgen could invoke esbuild (or use esbuild api) when -t {js,esbuild} --lifecycle post-build is set for esbuild projects. This would help solve the file name problem. We can use this metadata instead of babel.

Regarding indirect dependencies tracking with babel, I am in favour of enhancing it. Currently, we store Imported and Exported modules as separate properties in --deep mode. Then with research profile, we can stitch and construct data-flows with purls with enough precision using atom, so it kinda works. We can definitely keep checking to see if there is any babel-based implementation that might help.

In the interim, is there a way to improve our babel analysis to retain the dependency tree belonging to required packages?

pcbowers commented 2 months ago

Hmm, I think that could be tricky. esbuild has a lot of options, many of which are configured by other build tools like angular or vite. esbuild doesn't scan the build afterwards to pick up what was included, it generates the metafile while building. This means it would most likely be difficult to have cdxgen invoke esbuild itself and add the proper flag without breaking parts of the build. May be missing something here.

As for babel analysis: technically speaking, we could implement some of the code I just did for PNPM and include it in addEvidenceForImports. See https://github.com/CycloneDX/cdxgen/blob/master/utils.js#L9811, we do figure out which ones are required. If we could traverse the dependency map, setting each of its transitive dependencies to required as well, that could work. That function doesn't take a dependenciesMap though.

prabhu commented 2 months ago

@pcbowers Please perform any required refactoring to enhance addEvidenceForImports. We can wait for part 1 till some solution presents itself.