Open evamaxfield opened 8 months ago
Actually Syft seems to be for image / container specifications and many projects won't have containers. So this is useful for those.
@andrew how are you parsing R deps? And can I hit the https://packages.ecosyste.ms/api/v1/registries/pypi.org/packages
API quite heavily?
for prototype testing, the dataset is ~2k repos large, the number of data sources we now know of though takes up to somewhere in the 30,000 range.
Note: will need to use https://github.com/nvuillam/github-dependents-info for recursive downstream of github repo dependents (ecosyste.ms API doesn't have repo dependents only project dependents)
Actually Syft seems to be for image / container specifications and many projects won't have containers. So this is useful for those.
You can use it in a simple directory as well, doesn't need to be in a container.
@andrew how are you parsing R deps? And can I hit the https://packages.ecosyste.ms/api/v1/registries/pypi.org/packages API quite heavily?
I'm parsing R deps using https://github.com/ecosyste-ms/bibliothecary (via https://parser.ecosyste.ms)
You can, or you could use https://zenodo.org/records/10031778 which as all the data in it too
Note: will need to use https://github.com/nvuillam/github-dependents-info for recursive downstream of github repo dependents (ecosyste.ms API doesn't have repo dependents only project dependents)
You can get dependents via the usage api in https://repos.ecosyste.ms, for example: https://repos.ecosyste.ms/usage/pypi/movingpandas (and as json: https://repos.ecosyste.ms/api/v1/usage/pypi/movingpandas/dependencies)
You can also get the dependencies of a repo (rather than the dependents), from the repos service, including R projects: https://repos.ecosyste.ms/hosts/GitHub/repositories/evamaxfield%2Frs-graph (as json: https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/evamaxfield%2Frs-graph/manifests)
Recommended from @andrew
https://github.com/anchore/syft
will continue to use GitHub SBOM for prototyping / early analysis dataset as it is much faster but Syft should be ran prior to final analysis