clearlydefined / crawler

A service that crawls projects and packages for information relevant to ClearlyDefined
MIT License
44 stars 30 forks source link

License set as Declared is not in the package #519

Open capfei opened 1 year ago

capfei commented 1 year ago

https://clearlydefined.io/definitions/pypi/pypi/-/pillow/9.5.0

After harvesting pypi/pypi/-/pillow/9.5.0 yesterday, it came back with CAL-1.0 as the declared license but I could not find this anywhere in the harvested data.

image

I checked the previous version, 9.4.0, it looks like it also back as CAL-1.0 in the harvested data. https://clearlydefined.io/definitions/pypi/pypi/-/pillow/9.4.0

qtomlinson commented 7 months ago

There are two problems identified by this issue:

  1. extracting declared license in pypiFetch,
    • the license is extracted from classifiers (in the registry data) first. in _extractLicenseFromClassifiers, spdxCorrect("Historical Permission Notice and Disclaimer (HPND)") returns CAL-1.0.
    • if info.license (HPND) in the registry data is used to extract the license first instead, the resulting license from spdxCorrect("HPND") would be correct. This is related to the discussion
  2. scancode summarizer (in service) should pick up the declared license detected in scancode raw data.