anthonyharrison / sbom4python

A tool to generate a SBOM (Software Bill of Materials) for an installed Python module
Apache License 2.0
25 stars 6 forks source link

PyPi metadata doesn't have expected values - exception thrown #25

Open JR-Carroll opened 1 week ago

JR-Carroll commented 1 week ago

When running sbom4python against my project, I am getting an exception thrown. It looks like the PyPi package in question doesn't have repo_metadata filled out (NoneType). I debated opening a ticket in lib4sbom or lib4package (as it traces through both of them), but I landed here because the exception is unhandled in sbom4python; note - please consider if the other libs require additional hardening (aka applying "fixes" in the other libs may be worthwhile for other consumers that use lib4sbom and lib4package

GOOD PyPi Information: https://packages.ecosyste.ms/api/v1/registries/pypi.org/packages/tomli

OFFENDING PyPi Package: https://packages.ecosyste.ms/api/v1/registries/pypi.org/packages/tqdm

Expected Behavior:

When the missing data cannot be found it should not result in an unhandled exception. That said, the license information IS available for tqdm, it's just not where it's expected. So I suggest trying to grep it out of the other location it's found in the JSON payload.

Observed Behavior:

SBOM fails to execute when running sbom4python -r requirements.txt. With -d on, I can see it successfully get through a lot of packages, but it halts and exists on tqdm package due to the missing data in the repo_metadata.

image

Stacktrace/Breakpoint Using pdb

image image

Thoughts on the Fix

I am happy to submit a PR for this, but I can see many ways of fixing this and I think it's best for the repo owner to decide what's best for the architecture (of all libs involved).

My suggestion, take it or leave it, is to allow for extraction of licenses from other fields (understood parsing/grep'ing those may be more cumbersome than desired), else pull the string out exactly as it's in the field and do no parsing/grep'ing/regex'ing. Ultimately, this should be handled as the data coming into the lib4package, lib4sbom and sbom4python is external data and it looks like some packages don't play nice with PyPi or there is a lack of enforcement on the JSON blob (no DTD equivalent).

Ultimately I yield to the wisdom of the maintainer to decide what/where the fix goes. Yes, I agree, that I also thought about going back to the tqdm and asking them to fill in their repo_metadata, but that's seems silly (and inefficient) to go to each maintainer and ask them to fill out information for sbom4python.

anthonyharrison commented 2 days ago

Thanks @JR-Carroll. This highlights ne of the big challenges with the metadata associated with Python (and other) ecosystems - inconsistency. Added to my backlog. I will also raise an issue with the ecosyste.ms maintainer as the API should be more resilient.