Closed tenzap closed 1 year ago
I think this is just a straight mis-parse. I'm not 100% sure what the spdx file should look like, but I'm pretty sure that GPL-3.0-only WITH Qt-GPL-exception-1.0
is one license, while the other is LicenseRef-Qt-Commercial
, and the "or" is between them. So I'd expected two "LicenseInfoInFile" lines.
Would be nice to have a "round-trip" test - make sure that the parsed license info turns back into an identical spdx-license-identifier line as the input, though I don't remember if there's any license-expression-formatting in the codebase yet.
Looks like the relevant code is either https://github.com/fsfe/reuse-tool/blob/master/src/reuse/_util.py#L260 or called by it
In the SPDX community it is understood that:
LicenseInfoInFile: GPL-3.0-only
LicenseInfoInFile: LicenseRef-Qt-Commercial
LicenseInfoInFile: Qt-GPL-exception-1.0
would equate to:
GPL-3.0-only AND LicenseRef-Qt-Commercial AND Qt-GPL-exception-1.0
, which would indeed be a pretty annoying bug compared to the input.
I would say, it should simply directly translate SPDX-License-Identifier
to LicenseInfoInFile
and keep the string following the tag intact. That is what was found in the file, and it’s a valid SPDX statement.
Oh, so it's AND by default, not OR, that makes sense. (Think my tool or the Rust spdx-expression crate has a bug...) Yeah, the reuse tool is definitely doing too much work here and losing the details. (I think it's parsing further so it can check for the presence of the right license files, but that code probably shouldn't be used when generating spdx output)
Yes, we're reinventing the wheel here and should instead use standard libs for that. See #394
Also got bitten by this when trying to integrate REUSE with Rust. The Rust standard license is MIT OR Apache-2.0
, and there is currently no way to get machine-readable output from REUSE that distinguishes between OR
and AND
(since we'll need NCSA AND (MIT OR Apache-2.0)
in some places, licensing is fun!).
I would say, it should simply directly translate SPDX-License-Identifier to LicenseInfoInFile and keep the string following the tag intact. That is what was found in the file, and it’s a valid SPDX statement.
Would this something that would be accepted while we wait for the next generation of output?
So, I looked into this issue more, and it turns out the behavior of REUSE adheres to the SPDX 2.1 specification:
If license information for more than one license is contained in the file or if the license information offers the package recipient a choice of licenses, then each of the choices should be listed as a separate entry.
Still, SPDX assumes someone would also populate LicenseConcluded
, which REUSE intentionally sets to NOASSERTION
:
The options I could see are:
LicenseInformationInFile
. This would not be according to the narrative part of the spec, even though the "syntax" part says SpdxExpression
for the contents of the field.LicenseConcluded
.reuse spdx --add-concluded-licenses
so that people can opt into the behavior, noting in the documentation that REUSE cannot be sure the conclusion is correct.I'm not exactly sure anymore why the behaviour is what it is.
For LicenseInfoInFile, I think it's reasonable to do the following:
# SPDX-License-Identifier: MIT OR 0BSD
# SPDX-License-Identifier: Apache-2.0
->
LicenseInfoInFile: MIT OR 0BSD
LicenseInfoInFile: Apache-2.0
I think --add-concluded-licenses
is a good idea. Maybe @silverhook has some good input on concluding licensing using technology instead of humans.
The concluded licence in the above case would probably be (MIT OR 0BSD) AND Apache-2.0
or some such.
Opened a PR populating LicenseConcluded
, with or without (by dropping the last commit) a CLI flag: https://github.com/fsfe/reuse-tool/pull/623
That would indeed be useful, and super thanks for the PR @pietroalbini.
There is just one caveat I would like us to address – namely that (even if perhaps the spec’s might not be super explicit about it; and if so, that is likely a bug in the spec), within SPDX it is understood that:
LicenseInfoInFile
and other LicenseInfoIn*
LicenseConcluded
(and any other *Concluded
fields)As such, if we implement reuse spdx
to include LicenseConcluded
, it should be both:
reuse
tool’s output is “the truth” and not just a hintsee:
8.5 Concluded license field […] 8.5.2 Intent
Here, the intent is for the SPDX document creator to analyze the License Information in File (8.6) and other objective information, e.g., “COPYING FILE,” along with the results from any scanning tools, to arrive at a reasonably objective conclusion as to what license governs the file.
https://spdx.github.io/spdx-spec/v2.3/file-information/#85-concluded-license-field https://spdx.github.io/spdx-spec/v2.3/document-creation-information/#68-creator-field
Sounds good @silverhook, thanks for the prompt feedback! I updated the PR accordingly.
The output of
reuse spdx
has theLicenseInfoInFile
field. However, with this field it is not always possible to determine the original value ofSPDX-License-Identifier
For example if the input file has
The output will be
Which is not sufficient to determine the
SPDX-License-Identifier
This information would be useful when using the
reuse spdx
output as input of another script (for example to build a DEP5 file)