evamaxfield / rs-graph

Research Software Graph
Mozilla Public License 2.0
6 stars 1 forks source link

Use Syft for Docker instead of GitHub SBOM #17

Open evamaxfield opened 8 months ago

evamaxfield commented 8 months ago

Recommended from @andrew

https://github.com/anchore/syft

will continue to use GitHub SBOM for prototyping / early analysis dataset as it is much faster but Syft should be ran prior to final analysis

evamaxfield commented 8 months ago

Actually Syft seems to be for image / container specifications and many projects won't have containers. So this is useful for those.

@andrew how are you parsing R deps? And can I hit the https://packages.ecosyste.ms/api/v1/registries/pypi.org/packages API quite heavily?

for prototype testing, the dataset is ~2k repos large, the number of data sources we now know of though takes up to somewhere in the 30,000 range.

evamaxfield commented 8 months ago

Note: will need to use https://github.com/nvuillam/github-dependents-info for recursive downstream of github repo dependents (ecosyste.ms API doesn't have repo dependents only project dependents)

andrew commented 8 months ago

Actually Syft seems to be for image / container specifications and many projects won't have containers. So this is useful for those.

You can use it in a simple directory as well, doesn't need to be in a container.

@andrew how are you parsing R deps? And can I hit the https://packages.ecosyste.ms/api/v1/registries/pypi.org/packages API quite heavily?

I'm parsing R deps using https://github.com/ecosyste-ms/bibliothecary (via https://parser.ecosyste.ms)

You can, or you could use https://zenodo.org/records/10031778 which as all the data in it too

andrew commented 8 months ago

Note: will need to use https://github.com/nvuillam/github-dependents-info for recursive downstream of github repo dependents (ecosyste.ms API doesn't have repo dependents only project dependents)

You can get dependents via the usage api in https://repos.ecosyste.ms, for example: https://repos.ecosyste.ms/usage/pypi/movingpandas (and as json: https://repos.ecosyste.ms/api/v1/usage/pypi/movingpandas/dependencies)

andrew commented 8 months ago

You can also get the dependencies of a repo (rather than the dependents), from the repos service, including R projects: https://repos.ecosyste.ms/hosts/GitHub/repositories/evamaxfield%2Frs-graph (as json: https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/evamaxfield%2Frs-graph/manifests)