aboutcode-org / scancode.io

ScanCode.io is a server to script and automate software composition analysis pipelines with ScanPipe pipelines. This project is sponsored by NLnet project https://nlnet.nl/project/vulnerabilitydatabase/ Google Summer of Code, nexB and others generous sponsors!
https://scancodeio.readthedocs.io
Apache License 2.0
112 stars 85 forks source link

Add status or details for package #880

Open pombredanne opened 1 year ago

pombredanne commented 1 year ago

When I see a package in a scan, I would like to know where we got it from.

tdruez commented 1 year ago

@pombredanne Could you provide the status values for each case?

pombredanne commented 1 year ago

Could you provide the status values for each case?

This requires a bit of thinking, first what would the data structure be. @DennisClark ping, your help is much welcomed to design how we could track how and based on what clue we created a package during a ScanCode.io pipeline... like a scan or purldb match, or both, or a manifest or an SBOM...

mjherzog commented 1 year ago

We also need to plan ahead for when we may have status codes entered by a person The current status codes from SCIO are already confusing because they span multiple concepts:

DennisClark commented 1 year ago

Two new fields, I think:

pkg_origin list of values. The database or process that identified the package.

is_scanned yes/no/unknown Indicates if the package code was scanned by ScanCode Toolkit.

DennisClark commented 1 year ago

perhaps one more:

sctk_version
the version of ScanCode Toolkit used to scan the package.

pombredanne commented 8 months ago

I think we need to revisit this as we may have tried to pack too many things in one field:

AyanSinhaMahapatra commented 5 months ago

From a discussion with @pombredanne

This should be best implemented by a status/origin log which is a list of status values (similar to how we have detection logs in LicenseDetection objects)

This is a list and not a single value because, a package can have multiple data sources and origin, like in the following flow (keeping in mind future plans too):

  1. A package is created from a scan
  2. package is enriched by data from purldb/full scans in purldb
  3. package data is added for vulneribilities/quality

Suggesting the values (not exhaustive, please add and update) for this based on the list above by @DennisClark:

Suggestions on attribute name: origin/origin_log