aboutcode-org / scancode-toolkit

:mag: ScanCode detects licenses, copyrights, dependencies by "scanning code" ... to discover and inventory open source and third-party packages used in your code. Sponsored by NLnet project https://nlnet.nl/project/vulnerabilitydatabase, the Google Summer of Code, Azure credits, nexB and others generous sponsors!
https://github.com/aboutcode-org/scancode-toolkit/releases/
2.07k stars 536 forks source link

Improve license detection for project strapi #3118

Open DennisClark opened 1 year ago

DennisClark commented 1 year ago

The text at https://github.com/strapi/strapi/blob/refs/tags/v4.4.0/LICENSE provides an interesting challenge for license detection, since it tends to ramble on about exceptions at the beginning before it presents the primary license, mit, for the project. A recent scan resulted in a declared license value of unknown-license-reference which is rather incomplete. I understand that work is already in progress to provide better "see LICENSE" processing, so this example of project strapi sould make an excellent test case.

Download: https://github.com/strapi/strapi/archive/refs/tags/v4.4.0.tar.gz

Scan results attached. strapi-4.4.0.tar.gz_scan.json.zip

derrickmehaffy commented 1 year ago

I just happened to stumble across this, as I work for Strapi, I'm curious if there would be a better way to structure our license to make it easier for you (unless you are really interested in the challenge :sweat_smile: ). We have a bit of a hybrid license structure since only a few sub-directories aren't covered under the normal MIT license.

pombredanne commented 1 year ago

@derrickmehaffy Hey :) Thank you for chiming in... yes we love ways to make things easier or rather clearer! And since scancode is used in many places and projects that will make also your user's life easier.

  • All software that resides under an "ee/" directory (the “EE Software”), if that directory exists, is licensed under the license defined in "ee/LICENSE".

... is at best an ambiguous statement of sorts IMHO.

My suggestions:

  1. In all cases, you should clearly spell out what is under which license directly in https://github.com/strapi/strapi/blob/main/LICENSE otherwise this is a source of endless confusion for your users which is IMHO not a great thing (unless you thrive on confusion which I would very much doubt is the case ;) )

  2. You could use SPDX license identifiers and/or notices in all your source files such that there is no ambiguity about which file is under which license. There is nothing in https://github.com/strapi/strapi/blob/main/packages/core/admin/ee/server/controllers/role.js that tells me its license, beyond some implicit path. Explicit is always better than implicit wrt. licensing. Your EE license at https://github.com/strapi/strapi/blob/main/packages/core/admin/ee/LICENSE could be given a stable SPDX licenseref name in the scancode namespace, may be something such as "LicenseRef-scancode-strapi-ee-license" or similar to help there.

  3. You should ensure that this is detected correctly by ScanCode (and we can amend the detection rules as needed.)

  4. To eliminate any confusion and user mistakes, you could consider splitting ee vs. mit-licensed code in two repos?

derrickmehaffy commented 1 year ago

We are largely following it GitLabs footsteps here and due to the way node module distribution works we can't split repos unfortunately.

https://gitlab.com/gitlab-org/gitlab/-/blob/master/LICENSE

pombredanne commented 1 year ago

We are largely following it GitLabs footsteps here

Fair enough but these are not the best "footsteps" and this is pretty bad wrt. being machine readable and unambiguous IMHO.

Think of it this way: ambiguity in licensing could be eventually treated by ScanCode roughly as if it were as a "syntax" or "compilation" error in a programming language.

Check also https://reuse.software for inspiration which essentially what I was suggesting.