CycloneDX / cdxgen

Creates CycloneDX Bill of Materials (BOM) for your projects from source and container images. Supports many languages and package managers. Integrate in your CI/CD pipeline with automatic submission to Dependency Track server.
https://cyclonedx.github.io/cdxgen/
Apache License 2.0
515 stars 155 forks source link

python licenses are not fetched when requirements.txt is used #757

Closed knx-am closed 8 months ago

knx-am commented 8 months ago

When using requirements.txt for python dependencies with install-deps being active (FETCH_LICENSES set to true), the package metadata is not fetched from pypi and the licenses are missing in the SBOM. If install-deps is deactivated, the metadata is fetched and the licenses are present in the sbom. But then the transitive dependencies are missing. I debugged the issue a little bit and as far as I can tell, a call of getPyMetadata is missing in the cases where getPipFrozenTree is used. https://github.com/CycloneDX/cdxgen/blob/c9b28e7f52c4ade5d0a0723224530032751c6cff/index.js#L2427

prabhu commented 8 months ago

@knx-am could you investigate further and come up with a pull request?

knx-am commented 8 months ago

Thanks for the reply @prabhu. I tried to investigate the problem and put in several hours debugging. I have to admit I couldn't quite grasp all the steps involved in generating the package list with all the different sources and the merging and filtering. So I don't feel confident to find the root cause of the issue. What I can say at this point is that in my case, getPyMetadata is only called as part of getPyModules and getPyModules has this logic of calling atom and processing slices files etc. which only finds 2 dependencies out of the 55 which exist in the final SBOM. So for most packages, the licenses are not fetched because "the atom and slices logic" doesn't find them. Adding a call to getPyMetadata before generating the final SBOM solves the issue though. So basically adding pkgList = await getPyMetadata(pkgList, true); before this line https://github.com/CycloneDX/cdxgen/blob/c9b28e7f52c4ade5d0a0723224530032751c6cff/index.js#L2578 Would you accept that as a pull request?

prabhu commented 8 months ago

@knx-am, this is a brilliant investigation! Absolutely, an if condition wrapping this call is indeed the right fix.

I am looking forward to the PR!

if (FETCH_LICENSE) {
  pkgList = await getPyMetadata(pkgList, false);
}
knx-am commented 8 months ago

@prabhu Thanks for the quick reply. I'm somehow unable to push to the repository. remote: Permission to CycloneDX/cdxgen.git denied to knx-am. fatal: unable to access 'https://github.com/CycloneDX/cdxgen/': The requested URL returned error: 403 Do you have to give me access maybe? I tested the fix you proposed and it works as it should. All tests are also successful when I run them locally. So maybe you could push the fix?

prabhu commented 8 months ago

@knx-am, Thank you for confirming! You can use the GitHub pull request process to contribute the fix. I've included below a link to a document, or you can ask this chat app for help.

https://www.digitalocean.com/community/tutorials/how-to-create-a-pull-request-on-github

Please ensure the commits are both signed using gpg keys and also signed-off (with a comment). We have a PR bot that will help with these commands should you get stuck.

Looking forward to the PR!

knx-am commented 8 months ago

Thanks. Here is the pull request. I hope I did everything right https://github.com/CycloneDX/cdxgen/pull/766