clearlydefined / crawler

A service that crawls projects and packages for information relevant to ClearlyDefined
MIT License
43 stars 30 forks source link

QUESTION: Rationale for classifiers taking precedence over license info #523

Open elrayle opened 8 months ago

elrayle commented 8 months ago

Description

It appears that for pypi, classifiers take precedence over the license field when extracting the license information. It is clear from the code how this is happening. I'm wondering about the rationale for this approach. Also if it is determined that one is correct and the other is not, is there a process for updating the license or the classifier as needed?

Test

  it('parses the correct license information from classifiers in registry data', () => {
    const registryData = JSON.parse(fs.readFileSync('test/fixtures/pypi/registryData_lgpl2.json'))
    const declared = fetch._extractDeclaredLicense(registryData)
    expect(declared).to.be.equal('LGPL-2.0-only')
  })

Fixture Data

classifiers

    "classifiers": [
      ...
      "License :: OSI Approved :: GNU Lesser General Public License v2 (LGPLv2)",
      ...
    ],

license

    "license": "LGPL 2.1",

Expected

With a specific license given in info.license, I would expect the license to be either LGPL-2.0-or-later or LGPL-2.1-only.

Actual

Precedence is given to classifiers in function _extractDeclaredLicense, which produces license LGPL-2.0-only.

  _extractDeclaredLicense(registryData) {
    const licenseFromClassifiers = this._extractLicenseFromClassifiers(registryData)
    if (licenseFromClassifiers) return licenseFromClassifiers
    const license = get(registryData, 'info.license')
    return license && spdxCorrect(license)
  }
qtomlinson commented 6 months ago

@elrayle This will be a good topic to discuss in our next community meeting. Historically, license was only parsed from classifiers. This commit adds the functionality to extract license from info.license. To avoid any breaking changes, extracting from info.license is added as a fallback if no valid license is parsed from classifiers.

qtomlinson commented 6 months ago

In addition to the precedence order, this calculation/extraction of declared license is better done in service in my opinion. Reasons see comment

qtomlinson commented 3 months ago

Another case: https://clearlydefined.io/definitions/pypi/pypi/-/UpSetPlot/0.9.0.