clearlydefined / crawler

A service that crawls projects and packages for information relevant to ClearlyDefined
MIT License
48 stars 33 forks source link

[pypi] spdxcorrect is too aggressive for declaredLicense #273

Open dabutvin opened 5 years ago

dabutvin commented 5 years ago

at https://github.com/clearlydefined/crawler/blob/master/providers/fetch/pypiFetch.js#L77

  _extractDeclaredLicense(registryData) {
    const classifiers = get(registryData, 'info.classifiers')
    if (!classifiers) return null
    for (const classifier in classifiers) {
      if (classifiers[classifier].includes('License :: OSI Approved ::')) {
        const lastColon = classifiers[classifier].lastIndexOf(':')
        const rawLicense = classifiers[classifier].slice(lastColon + 1)
        return spdxCorrect(rawLicense)
      }
    }
    return null
  }

For example https://pypi.org/pypi/Flask/json

looks at info.classifiers and pulls out

"License :: OSI Approved :: BSD License",

runs spdx-correct on BSD License => BSD-2-Clause when it should be BSD-3-Clause

We should probably drop this from the crawler side entirely and let the summarizers parse the data without spdx correct

dabutvin commented 5 years ago

see https://github.com/clearlydefined/curated-data/pull/1234 for another instance where this popped up

jeffmendoza commented 5 years ago

Need to check to see if this is still a problem.

nellshamrell commented 3 years ago

Just ran a harvest in my local setup on:

pypi/pypi/-/ipywidgets/7.4.2

It shows a Declared License of BSD-2-Clause AND BSD-3-Clause.

It shows a Discovered license of BSD-3-Clause.

So this is reproducible and warrants further investigation.