CycloneDX / cyclonedx-node-yarn

Create CycloneDX Software Bill of Materials (SBOM) from Node.js Yarn projects.
Apache License 2.0
21 stars 5 forks source link

Extraction of license text from files. #33 #193

Open AugustusKling opened 4 weeks ago

AugustusKling commented 4 weeks ago

Untested code for license text extraction.

jkowalleck commented 4 weeks ago

we have a basic implementation here: https://github.com/CycloneDX/cyclonedx-webpack-plugin/blob/72700f06d00eac79fa3b91fe838bd78c583346a2/src/extractor.ts#L135

I was thinking of pulling this one into the CDX library,. so it is available for every downstream user - like here... Could you review that other implementation and see if it suits your case here?

PS: I found that your implementation is basically a copy/past from the mentioned implementation. So i guess it is any good -- so better pull it over to the library, than copy/pasting it here.

AugustusKling commented 4 weeks ago

Taking a file name + a data blob and converting it to an instance of CDX.Models.License would make sense in the CDX library. Assuming specific loggers or that packages contents would be kept in ordinary files should not be part of the library.

Since you explicitly referred to the other implementation in ticket #33, the implementation here matches it. One would not want different behavior for different libraries under the CycloneDX umbrella.

Before this PR has an chance of moving forward, this repo needs to be fixed. The current ESLint setup does not work with the present TypeScript version. You can do a yarn up 'typescript@<5.4.0' to change to a compatible version by the way.

Also, the tests in the repo depend on yarn install for the test projects as shown by tests/integration/setup.js. This was not necessary in the original code and should not be necessary, now. The plugin needs to retrieve the packages to be able to read their manifests (package.json) and now the license texts. But it must not execute an yarn install ... or similar in order to produce an SBOM as the dependency resolution is retained in the yarn.lock file and installations are risky security-wise.

jkowalleck commented 4 weeks ago

Since you explicitly referred to the other implementation in ticket https://github.com/CycloneDX/cyclonedx-node-yarn/issues/33, the implementation here matches it. One would not want different behavior for different libraries under the CycloneDX umbrella.

Exactly.

here is the (WIP/draft) PR to bring the functionality to the library: https://github.com/CycloneDX/cyclonedx-javascript-library/pull/1158

jkowalleck commented 4 weeks ago

Before this PR has an chance of moving forward, this repo needs to be fixed. The current ESLint setup does not work with the present TypeScript version. You can do a yarn up 'typescript@<5.4.0' to change to a compatible version by the way.

Also, the tests in the repo depend on yarn install for the test projects as shown by tests/integration/setup.js. This was not necessary in the original code and should not be necessary, now. The plugin needs to retrieve the packages to be able to read their manifests (package.json) and now the license texts. But it must not execute an yarn install ... or similar in order to produce an SBOM as the dependency resolution is retained in the yarn.lock file and installations are risky security-wise.

Please discuss these things in individual extra tickets. Thank you in advance.

jkowalleck commented 4 weeks ago

https://github.com/CycloneDX/cyclonedx-javascript-library/pull/1158 was postponed and will not ship any soon.

Please continue your work crafting a yarn=specific implementation. This will help understand opportunities to generalize in the upstream-library later.

jkowalleck commented 3 weeks ago

i had to fix one of the github workflows. Please rebase/merge the latest main branch.

AugustusKling commented 3 weeks ago

Note that despite the build passing in the PR validations, the existing tests fail for me. This is with the yarn.lock from the latest release at https://github.com/CycloneDX/cyclonedx-node-yarn/releases/tag/v1.0.2

Example with similar changes in many test cases:

[test:node    ]       + expected - actual
[test:node    ]
[test:node    ]            </properties>
[test:node    ]          </metadata>
[test:node    ]          <components>
[test:node    ]            <component type="library" bom-ref="in-array@npm:0.1.2">
[test:node    ]       -      <author>Jon Schlinkert (https://github.com/jonschlinkert)</author>
[test:node    ]       +      <author>Jon Schlinkert</author>
[test:node    ]              <name>in-array</name>
[test:node    ]              <version>0.1.2</version>
[test:node    ]              <description>Return true if a value exists in an array. Faster than using indexOf and won't blow up on null values.</description>
[test:node    ]              <licenses>
[test:node    ] --
[test:node    ]                  <url>https://github.com/jonschlinkert/in-array/issues</url>
[test:node    ]                  <comment>as detected from PackageJson property "bugs.url"</comment>
[test:node    ]                </reference>
[test:node    ]                <reference type="vcs">
[test:node    ]       -          <url>jonschlinkert/in-array</url>
[test:node    ]       -          <comment>as detected from PackageJson property "repository"</comment>
[test:node    ]       +          <url>git+https://github.com/jonschlinkert/in-array.git</url>
[test:node    ]       +          <comment>as detected from PackageJson property "repository.url"</comment>
[test:node    ]                </reference>
[test:node    ]                <reference type="website">
[test:node    ]                  <url>https://github.com/jonschlinkert/in-array</url>
[test:node    ]                  <comment>as detected from PackageJson property "homepage"</comment>
[test:node    ] --
[test:node    ]                  <url>https://github.com/dcousens/is-sorted/issues</url>
[test:node    ]                  <comment>as detected from PackageJson property "bugs.url"</comment>
[test:node    ]                </reference>
[test:node    ]                <reference type="vcs">
[test:node    ]       -          <url>https://github.com/dcousens/is-sorted.git</url>
[test:node    ]       +          <url>git+https://github.com/dcousens/is-sorted.git</url>
[test:node    ]                  <comment>as detected from PackageJson property "repository.url"</comment>
[test:node    ]                </reference>
[test:node    ]                <reference type="website">
[test:node    ]                  <url>https://github.com/dcousens/is-sorted</url>
[test:node    ]
[test:node    ]       at runTest (tests/integration/index.test.js:156:12)
jkowalleck commented 2 weeks ago

a lot of dependencies and other things were bumped lately. I will postpone all dependency bumps until this very feature is merged :+1: please continue, all your efforts are appreciated very much.

Better rebase/merge master, delete your local .pnp.*/.yarn/yarn.lock, and run yarn install again.


regarding your failing tests - https://github.com/CycloneDX/cyclonedx-node-yarn/pull/193#issuecomment-2460652894

the underlying test beds ship own lock files, they are not affected by your project lock file. anyway, the lock file from v1.0.2 is an outdated static one that is not carries over by maintenance. Of course, it makes no sense to pull it in the project when developing new features.

Anyway, do you maybe have old build artifacts, that need to be removed before testing?

AugustusKling commented 2 weeks ago

a lot of dependencies and other things were bumped lately. I will postpone all dependency bumps until this very feature is merged 👍 please continue, all your efforts are appreciated very much.

Just for your info the part of the install where node-gyp is used to build libxmljs2 is failing in case you run with Node 22 (current LTS). It's header files are incompatible with the referenced libxml. Updating libxmljs2 to 0.35.0 in CycloneDX/cyclonedx-javascript-library should solve this.

Building on Node 20 is not showing the incompatibility and should work with libxmljs2 version 0.33.0 as well as 0.35.0.

Anyway, do you maybe have old build artifacts, that need to be removed before testing?

* `./yarn-plugin-cyclonedx.cjs`

* `dist/yarn-plugin-cyclonedx.cjs`

Thanks for the hints. It turned out that test errors were the result of a bug in my changed code. It's fixed with correcting the invocation of the normalize-package-data library in my last commit.

Please try out the changes and check if the license evidence is included as you would expect in the resulting SBOMs.

jkowalleck commented 2 weeks ago

Just for your info the part of the install where node-gyp is used to build libxmljs2 is failing in case you run with Node 22 (current LTS). It's header files are incompatible with the referenced libxml. Updating libxmljs2 to 0.35.0 in CycloneDX/cyclonedx-javascript-library should solve this.

Building on Node 20 is not showing the incompatibility and should work with libxmljs2 version 0.33.0 as well as 0.35.0

this libxmljs2 is a transitive optional dependency. I would love to omit it. Do you see a way to do so, @AugustusKling ?

AugustusKling commented 2 weeks ago

this libxmljs2 is a transitive optional dependency. I would love to omit it. Do you see a way to do so, @AugustusKling ?

I do not think this is doable from cyclonedx-node-yarn.

Yarn and probably other package managers, too, would download all optional dependencies in the dependency tree, transitively. Then they should compare the current system environment with the compatibility info from the packages' manifests; that is the fields os, cpu, libc. If these are compatible or absent, the package manager needs to invoke the packages build script. If the build succeeds, the package will be used, otherwise it will be excluded from the installation.

In the specific case here the build of libxmljs2 fails and Yarns warns about it. This failure with the libxmljs2 packages does however not abort the installation of cyclonedx-node-yarn. Instead, Yarn excludes libxmljs2 automatically since it's only part of the dependency tree as an optional dependency. You'll lose whatever optional feature or optimization libxmljs2 should have contributed to @cyclonedx/cyclonedx-library.

If you really wanted to exclude libxmljs2, you'd need to remove it from @cyclonedx/cyclonedx-library but I guess it is there for a reason. Instead, update libxmljs2 to 0.35.0 to allow for a successful build with more Node versions.

The following is Yarn's way to notify about the build failure or an optional dependency. It would be nicer if the wording was more explicit and it said something along the lines of "excluding libxmljs2 which is an optional dependency".

➤ YN0000: · Yarn 4.5.1
➤ YN0000: ┌ Resolution step
➤ YN0000: └ Completed
➤ YN0000: ┌ Post-resolution validation
➤ YN0086: │ Some peer dependencies are incorrectly met by dependencies; run yarn explain peer-requirements for details.
➤ YN0000: └ Completed
➤ YN0000: ┌ Fetch step
➤ YN0000: └ Completed in 0s 307ms
➤ YN0000: ┌ Link step
➤ YN0000: │ ESM support for PnP uses the experimental loader API and is therefore experimental
➤ YN0007: │ libxmljs2@npm:0.33.0 must be built because it never has been before or the last one failed
➤ YN0009: │ libxmljs2@npm:0.33.0 couldn't be built successfully (exit code 1, logs can be found here: /tmp/xfs-8914cf2a/build.log)
➤ YN0000: └ Completed in 18s 797ms
➤ YN0000: · Done with warnings in 19s 425ms
jkowalleck commented 2 weeks ago

re https://github.com/CycloneDX/cyclonedx-node-yarn/pull/193#issuecomment-2474161240

I expected that, since - according to the docs I've read - yarn's concept of optional is a very well-thought one. But since I am not a heavy user of yarn, I thought I should ask, before assuming something. Thank you for the detailed explanation. :+1:

The features provided by this optional dependency are not used in the code at the moment, so they are not bundled in the final product, and no related code is generated in the build artifact.

From developer experience, the warning on setup/install looks ugly, true. But it's fine, it is not a blocker. If it was, please open an issue for that.

AugustusKling commented 2 weeks ago

From developer experience, the warning on setup/install looks ugly, true. But it's fine, it is not a blocker. If it was, please open an issue for that.

The warning is easily misunderstood for a problem but can be safely ignored.