aboutcode-org / scancode-toolkit

:mag: ScanCode detects licenses, copyrights, dependencies by "scanning code" ... to discover and inventory open source and third-party packages used in your code. Sponsored by NLnet project https://nlnet.nl/project/vulnerabilitydatabase, the Google Summer of Code, Azure credits, nexB and others generous sponsors!
https://aboutcode.org/scancode/
2.14k stars 552 forks source link

unexpected LicenseRefs #3974

Open elrayle opened 2 weeks ago

elrayle commented 2 weeks ago

ClearlyDefined added support for LicenseRefs. Scancode is the only source at the moment that produces LicenseRefs that are used. I'm seeing a few results that are unexpected. Can you provide information on the following LicenseRefs? (selected out a few, there may be others that are similar)

Not in the list of scancode-licensedb...

In the list of scancode-licensedb, but appear to be catch alls...

pombredanne commented 2 weeks ago

These two are not from ScanCode, as we always use a "LicenseRef-scancode" prefix, but these are aliases found in the wild that we listed here: https://scancode-licensedb.aboutcode.org/proprietary-license.html but we should not report these as SPDX licenses on our side. Else this is a bug.

Do know which file they were detected in exactly?

This one is weird:

These are "generic" licenses with "is_generic" flag set to true:

  1. They are detected using various rules and you always want to use the --license-text option to get the exact matched license or notice text. (This is always a good thing to use in all cases)

  2. unknown-license-reference are common and many of them are recombined in the top level "license_detections" results, a feature recently added

For instance, say we have these fictitious license rules:

With the license detection recombination, a. followed by b. will be reported only as gpl-2.0, and same for a. then c as mit.

This means that 1. you should use the --license-text option to collect the matched text and 2. you need to use the top level detections and not only the lower level license matches

elrayle commented 2 weeks ago

Thanks for all that info. I am including a single package-version for each.

Not in scancode db

license CD coordinates
LicenseRef-LICENSE npm/npmjs/-/arrow-orm/0.2.72
LicenseRef-LICENSE.md git/github/strongloop/strong-executor/209f0de764ca072008f18b414a81becbef3957f9
LicenseRef-Unspecified git/github/dlaidig/qmt/3a90d19c2fdea0b4579fb8808225bcb9862fc3ae
LicenseRef-Rakuten-Group-Proprietary-License maven/mavencentral/io.github.ec-mobile.rex/icons-compose-android/1.0

In scancode db

license CD coordinates
LicenseRef-scancode-commercial-license maven/mavencentral/com.vaadin/vaadin-upload-flow/14.12.1
LicenseRef-scancode-free-unknown crate/cratesio/-/bitwarden-core/1.0.0
LicenseRef-scancode-generic-export-compliance maven/mavencentral/org.eclipse.persistence/org.eclipse.persistence.asm/9.7.1
LicenseRef-scancode-generic-cla go/golang/github.com%2fazure%2fazure-sdk-for-go%2fsdk/azcore/v1.16.0
LicenseRef-scancode-proprietary-license npm/npmjs/-/apexcharts/3.54.1
LicenseRef-scancode-unknown-license-reference maven/mavencentral/com.melloware/commons-beanutils2/2.0.0
LicenseRef-scancode-unknown maven/mavencentral/com.melloware/commons-beanutils2/2.0.0
elrayle commented 2 weeks ago

FYI... we had data from before the update. I did an analysis on the pre-2.0 data. It already includes LicenseRefs. These are the stats from that analysis.

List of LicenseRefs in pre-v2.0 data sorted by the number of packages (ignoring versions) that they appear in:

Image

pombredanne commented 2 weeks ago

@elrayle Thanks!

re:

"List of LicenseRefs in pre-v2.0 data sorted by the number of packages (ignoring versions) that they appear in:"

Do you mind to attach a text file?

(Tesseract is not too shabby at reading PNGs, but I would prefer the raw text)

Also do you have one example where each of these show up?

Tesseract's OCR output:


Unique LicenseRef only Count of packages
LicenseRef-LICENSE 200
LicenseRef-NetCommons 112
LicenseRef-Slint-commercial 25

LicenseRef-LICENSE.md
LicenseRef-.amazon.com.-AmznSL-1.0
LicenseRef-jsoncpp-public-domain
LicenseRef-PdfiumThirdParty
LicenseRef-qskinny
LicenseRef-Rakuten-Group-Proprietary-License
LicenseRef-Qt-Commercial
LicenseRef-BSD-3-Clause-CMU
LicenseRef-fitsio

LicenseRef-HDF5
LicenseRef-JSONinJSPublicDomain
LicenseRef-MIT-Bootstrap
LicenseRef-mit-dmic
LicenseRef-MIT-like
LicenseRef-OpenEvidence
LicenseRef-PIL
LicenseRef-Proprietary
LicenseRef-Proprietaryintel
LicenseRef-PSF-based
LicenseRef-scancode-other-copyleft
LicenseRef-SHA1-Public-Domain
LicenseRef-SixtyFPS-commercial
LicenseRef-tzdata-PublicDomain
LicenseRef-Automake-exception-2.0
LicenseRef-Chef-EULA
LicenseRef-Custom
LicenseRef-EPL-Steward
LicenseRef-KhronosFreeUse
LicenseRef -LICENCE
LicenseRef-LICENSE. txt
LicenseRef-NextcloudTrademarks
LicenseRef-old-glib-tests
LicenseRef-PNGSuite
LicenseRef-ProprietaryMicrosoft
LicenseRef-Public-Domain
LicenseRef-PUBLIC-DOMAIN-xi2-xy
LicenseRef-tomb.v1
LicenseRef-UFL-1.0
LicenseRef-unDraw
LicenseRef-Unspecified
LicenseRef-yarn

TOTAL packages with a LicensoRef
(count does not include versions)
``  
elrayle commented 2 weeks ago
Unique LicenseRef only Count of packages CD Coordinates
LicenseRef-LICENSE 200 npm/npmjs/-/arrow-orm/0.2.72
LicenseRef-NetCommons 112 composer/packagist/netcommons/access-counters
LicenseRef-Slint-commercial 25 crate/cratesio/-/vtable
LicenseRef-LICENSE.md 12 git/github/strongloop/strong-executor/209f0de764ca072008f18b414a81becbef3957f9
LicenseRef-.amazon.com.-AmznSL-1.0 7 npm/npmjs/@alexa-games/sfb-cli
LicenseRef-jsoncpp-public-domain 4 git/github/khronosgroup/openxr-sdk
LicenseRef-PdfiumThirdParty 4 pypi/pypi/-/pypdfium2
LicenseRef-qskinny 4 /LicenseRef-Automake-exception-2.0/LicenseRef-HDF5/3
LicenseRef-Rakuten-Group-Proprietary-License 4 maven/mavencentral/io.github.ec-mobile.rex/icons-compose-android/1.0
LicenseRef-Qt-Commercial 3 git/github/qtproject/pyside-pyside-setup
LicenseRef-BSD-3-Clause-CMU 2 pypi/pypi/-/benchexec
LicenseRef-fitsio 2 conda/conda-forge/linux-64/cfitsio
LicenseRef-HDF5 2 conda/conda-forge/linux-64/hdf5
LicenseRef-JSONinJSPublicDomain 2 git/github/sap/openui5
LicenseRef-MIT-Bootstrap 2 git/github/liferay/clay
LicenseRef-mit-drnic 2 git/github/sap/cloud-authorization-buildpack
LicenseRef-MIT-like 2 git/github/fosslight/fosslight_source_scanner
LicenseRef-OpenEvidence 2 git/github/curl/curl
LicenseRef-PIL 2 conda/conda-forge/linux-64/pillow
LicenseRef-Proprietary 2 git/github/com-posers-pit/smw_music
LicenseRef-ProprietaryIntel 2 conda/conda-forge/linux-64/mkl
LicenseRef-PSF-based 2 conda/conda-forge/win-64/matplotlib-base
LicenseRef-scancode-other-copyleft 2 pypi/pypi/-/scancode-toolkit-mini
LicenseRef-SHA1-Public-Domain 2 git/github/qt/qtbase
LicenseRef-SixtyFPS-commercial 2 crate/cratesio/-/vtable
LicenseRef-tzdata-PublicDomain 2 git/github/sap/openui5
LicenseRef-Automake-exception-2.0 1 git/github/isc-projects/bind9
LicenseRef-Chef-EULA 1 gem/rubygems/-/inspec-core
LicenseRef-Custom 1 pypi/pypi/-/salientsdk
LicenseRef-EPL-Steward 1 git/github/graphs4value/refinery
LicenseRef-KhronosFreeUse 1 git/github/khronosgroup/spirv-cross
LicenseRef-LICENCE 1 npm/npmjs/-/formally
LicenseRef-LICENSE.txt 1 npm/npmjs/-/physiojs
LicenseRef-NextcloudTrademarks 1 git/github/nextcloud/android
LicenseRef-old-glib-tests 1 git/github/gnome/glib
LicenseRef-PNGSuite 1 git/github/khronosgroup/ktx-software
LicenseRef-ProprietaryMicrosoft 1 conda/conda-forge/win-64/ucrt
LicenseRef-Public-Domain 1 conda/conda-forge/noarch/tzdata
LicenseRef-PUBLIC-DOMAIN-xi2-xy 1 git/github/gardener/gardener-extension-shoot-networking-filter
LicenseRef-tomb.v1 1 git/github/sap/cloud-authorization-buildpack
LicenseRef-UFL-1.0 1 crate/cratesio/-/epaint
LicenseRef-unDraw 1 git/github/pistacheio/pistache
LicenseRef-Unspecified 1 git/github/dlaidig/qmt
LicenseRef-yarn 1 git/github/hedgedoc/html-to-react
     
TOTAL packages with a LicenseRef 425  (count does not include versions)
elrayle commented 2 weeks ago

@dangoor Found this related issue from 2022.

elrayle commented 2 weeks ago

This is the results comparing the OLD and the NEW. I can look at adding coordinates when I get a chance.

  NEW OLD
Unique LicenseRef only Count of packages Count of packages
LicenseRef-scancode-unknown-license-reference 274  
LicenseRef-LICENSE 200 200
LicenseRef-NetCommons 112 112
LicenseRef-scancode-generic-cla 69  
LicenseRef-scancode-proprietary-license 42  
LicenseRef-scancode-commercial-license 29  
LicenseRef-scancode-public-domain 26  
LicenseRef-Slint-commercial 25 25
LicenseRef-scancode-other-permissive 24  
LicenseRef-scancode-unknown 19  
LicenseRef-scancode-warranty-disclaimer 18  
LicenseRef-LICENSE.md 12 12
LicenseRef-Slint-Software-3.0 12  
LicenseRef-scancode-free-unknown 11  
LicenseRef-.amazon.com.-AmznSL-1.0 7 7
LicenseRef-scancode-protobuf 7  
LicenseRef-scancode-unicode-mappings 7  
LicenseRef-scancode-generic-export-compliance 6  
LicenseRef-qskinny 5 4
LicenseRef-jsoncpp-public-domain 4 4
LicenseRef-Rakuten-Group-Proprietary-License 4 4
LicenseRef-scancode-dco-1.1 4  
LicenseRef-scancode-ms-net-library-2018-11 4  
LicenseRef-scancode-other-copyleft 4 2
LicenseRef-scancode-public-domain-disclaimer 4  
LicenseRef-PdfiumThirdParty 3 4
LicenseRef-Qt-Commercial 3 3
LicenseRef-scancode-ms-edge-devtools-2022 3  
LicenseRef-scancode-paypal-sdk-2013-2016 3  
LicenseRef-scancode-unknown-spdx 3  
LicenseRef-scancode-w3c-docs-20021231 3  
LicenseRef-BSD-3-Clause-CMU 2 2
LicenseRef-fitsio 2 2
LicenseRef-HDF5 2 2
LicenseRef-JSONinJSPublicDomain 2 2
LicenseRef-LICENSE.txt 2 1
LicenseRef-MIT-Bootstrap 2 2
LicenseRef-mit-drnic 2 2
LicenseRef-MIT-like 2 2
LicenseRef-NextcloudTrademarks 2 1
LicenseRef-OpenEvidence 2 2
LicenseRef-PIL 2 2
LicenseRef-Proprietary 2 2
LicenseRef-ProprietaryIntel 2 2
LicenseRef-PSF-based 2 2
LicenseRef-scancode-mit-old-style 2  
LicenseRef-scancode-sunsoft 2  
LicenseRef-scancode-unicode 2  
LicenseRef-SHA1-Public-Domain 2 2
LicenseRef-SixtyFPS-commercial 2 2
LicenseRef-tzdata-PublicDomain 2 2
LicenseRef-UFL-1.0 2 1
LicenseRef-Automake-exception-2.0 1 1
LicenseRef-Chef-EULA 1 1
LicenseRef-Custom 1 1
LicenseRef-EPL-Steward 1 1
LicenseRef-KhronosFreeUse 1 1
LicenseRef-LICENCE 1 1
LicenseRef-old-glib-tests 1 1
LicenseRef-PNGSuite 1 1
LicenseRef-ProprietaryMicrosoft 1 1
LicenseRef-Public-Domain 1 1
LicenseRef-PUBLIC-DOMAIN-xi2-xy 1 1
LicenseRef-scancode-bsd-new-tcpdump 1  
LicenseRef-scancode-eclipse-sua-2014 1  
LicenseRef-scancode-facebook-patent-rights-2 1  
LicenseRef-scancode-facebook-software-license 1  
LicenseRef-scancode-fair-source-0.9 1  
LicenseRef-scancode-generic-trademark 1  
LicenseRef-scancode-ietf-trust 1  
LicenseRef-scancode-info-zip-2005-02 1  
LicenseRef-scancode-linking-exception-lgpl-2.0plus 1  
LicenseRef-scancode-microchip-products-2018 1  
LicenseRef-scancode-ms-azure-spatialanchors-2.9.0 1  
LicenseRef-scancode-ms-dxsdk-d3dx-9.29.952.3 1  
LicenseRef-scancode-ms-net-library 1  
LicenseRef-scancode-ms-patent-promise 1  
LicenseRef-scancode-mulanpsl-2.0-en 1  
LicenseRef-scancode-northwoods-sla-2021 1  
LicenseRef-scancode-python-cwi 1  
LicenseRef-scancode-secret-labs-2011 1  
LicenseRef-scancode-sun-sissl-1.0 1  
LicenseRef-scancode-us-govt-public-domain 1  
LicenseRef-scancode-vhfpl-1.1 1  
LicenseRef-tomb.v1 1 1
LicenseRef-unDraw 1 1
LicenseRef-Unspecified 1 1
LicenseRef-yarn 1 1
     
TOTAL packages with a LicenseRef 1025 425
(count does not include versions)    
elrayle commented 1 week ago

@pombredanne If there are questions about licenses in ScanCode LicenseDB, is there a preferred place for the questions to be asked? I am writing a blob post announcing the support of LicenseRefs and want to include a statement like...

If you have comments on the actual LicenseRefs, you should reach out to ScanCode License DB maintainers.

pombredanne commented 1 week ago

@elrayle re:

I am writing a blob post announcing the support of LicenseRefs and want to include a statement like...

Awesome :bow: ... Please also link it here when done so we can relay and amplify!

but I would say instead:

If you have comments on the actual LicenseRefs, you should reach out to ScanCode Toolkit maintainers of the License DB.

The license DB is entirely generated from ScanCode toolkit licenses for now, so here is the place to report and discuss these issues. At some point of time, we could either extract the license DB in its own repo or publish it also as it its solo package, but I am not sure of the benefits?

@AyanSinhaMahapatra @DennisClark ping, what do you think?

pombredanne commented 1 week ago

quick side note: some (or many?) of these licenseref exists in the wild. See for instance https://github.com/emilk/egui/pull/5361

elrayle commented 19 hours ago

ClearlyDefined v2.0 adds support for LicenseRefs (blog post)

elrayle commented 18 hours ago

@pombredanne We've been discussing internally LicenseRef-scancode-unknown-license-reference (274 instances in CD) and LicenseRef-scancode-unknown (19 instances in CD).

I'd like to understand the decision to include these LicenseRefs? We are wondering why the license doesn't continue to be NOASSERTION.

My concern would be if other tools created their own LicenseRefs representing unknown, this could lead to a license in CD that is a combination of several LicenseRefs from each of the tools all representing unknown instead of a single NOASSERTION.