fsfe / reuse-tool

reuse is a tool for compliance with the REUSE recommendations.
https://reuse.software
412 stars 150 forks source link

reuse spdx: also output the full `SPDX-License-Identifier` #586

Closed tenzap closed 1 year ago

tenzap commented 2 years ago

The output of reuse spdx has the LicenseInfoInFilefield. However, with this field it is not always possible to determine the original value of SPDX-License-Identifier

For example if the input file has

// Copyright (C) 2016 Paul Lemire <paul.lemire350@gmail.com>
// SPDX-License-Identifier: LicenseRef-Qt-Commercial OR GPL-3.0-only WITH Qt-GPL-exception-1.0

The output will be

FileName: ./qt3d/tests/auto/input/utils/tst_utils.cpp
SPDXID: SPDXRef-ec9701a4a6e83e5c346e9f4377005153
FileChecksum: SHA1: 78c032c19e50ba890da9609caca80ca3edea1166
LicenseConcluded: NOASSERTION
LicenseInfoInFile: GPL-3.0-only
LicenseInfoInFile: LicenseRef-Qt-Commercial
LicenseInfoInFile: Qt-GPL-exception-1.0
FileCopyrightText: <text>Copyright (C) 2016 Paul Lemire <paul.lemire350@gmail.com></text>

Which is not sufficient to determine the SPDX-License-Identifier

This information would be useful when using the reuse spdx output as input of another script (for example to build a DEP5 file)

rpavlik commented 2 years ago

I think this is just a straight mis-parse. I'm not 100% sure what the spdx file should look like, but I'm pretty sure that GPL-3.0-only WITH Qt-GPL-exception-1.0 is one license, while the other is LicenseRef-Qt-Commercial, and the "or" is between them. So I'd expected two "LicenseInfoInFile" lines.

Would be nice to have a "round-trip" test - make sure that the parsed license info turns back into an identical spdx-license-identifier line as the input, though I don't remember if there's any license-expression-formatting in the codebase yet.

rpavlik commented 2 years ago

Looks like the relevant code is either https://github.com/fsfe/reuse-tool/blob/master/src/reuse/_util.py#L260 or called by it

silverhook commented 2 years ago

In the SPDX community it is understood that:

LicenseInfoInFile: GPL-3.0-only
LicenseInfoInFile: LicenseRef-Qt-Commercial
LicenseInfoInFile: Qt-GPL-exception-1.0

would equate to:

GPL-3.0-only AND LicenseRef-Qt-Commercial AND Qt-GPL-exception-1.0, which would indeed be a pretty annoying bug compared to the input.

I would say, it should simply directly translate SPDX-License-Identifier to LicenseInfoInFile and keep the string following the tag intact. That is what was found in the file, and it’s a valid SPDX statement.

rpavlik commented 2 years ago

Oh, so it's AND by default, not OR, that makes sense. (Think my tool or the Rust spdx-expression crate has a bug...) Yeah, the reuse tool is definitely doing too much work here and losing the details. (I think it's parsing further so it can check for the presence of the right license files, but that code probably shouldn't be used when generating spdx output)

mxmehl commented 2 years ago

Yes, we're reinventing the wheel here and should instead use standard libs for that. See #394

pietroalbini commented 2 years ago

Also got bitten by this when trying to integrate REUSE with Rust. The Rust standard license is MIT OR Apache-2.0, and there is currently no way to get machine-readable output from REUSE that distinguishes between OR and AND (since we'll need NCSA AND (MIT OR Apache-2.0) in some places, licensing is fun!).

I would say, it should simply directly translate SPDX-License-Identifier to LicenseInfoInFile and keep the string following the tag intact. That is what was found in the file, and it’s a valid SPDX statement.

Would this something that would be accepted while we wait for the next generation of output?

pietroalbini commented 2 years ago

So, I looked into this issue more, and it turns out the behavior of REUSE adheres to the SPDX 2.1 specification:

If license information for more than one license is contained in the file or if the license information offers the package recipient a choice of licenses, then each of the choices should be listed as a separate entry.

Still, SPDX assumes someone would also populate LicenseConcluded, which REUSE intentionally sets to NOASSERTION:

https://github.com/fsfe/reuse-tool/blob/2b0d470919f98625f31b64819c889b0c54fc0180/src/reuse/report.py#L148-L150

The options I could see are:

carmenbianca commented 2 years ago

I'm not exactly sure anymore why the behaviour is what it is.

For LicenseInfoInFile, I think it's reasonable to do the following:

# SPDX-License-Identifier: MIT OR 0BSD
# SPDX-License-Identifier: Apache-2.0

->

LicenseInfoInFile: MIT OR 0BSD
LicenseInfoInFile: Apache-2.0

I think --add-concluded-licenses is a good idea. Maybe @silverhook has some good input on concluding licensing using technology instead of humans.

The concluded licence in the above case would probably be (MIT OR 0BSD) AND Apache-2.0 or some such.

pietroalbini commented 2 years ago

Opened a PR populating LicenseConcluded, with or without (by dropping the last commit) a CLI flag: https://github.com/fsfe/reuse-tool/pull/623

silverhook commented 2 years ago

That would indeed be useful, and super thanks for the PR @pietroalbini.

There is just one caveat I would like us to address – namely that (even if perhaps the spec’s might not be super explicit about it; and if so, that is likely a bug in the spec), within SPDX it is understood that:

As such, if we implement reuse spdx to include LicenseConcluded, it should be both:

see:

8.5 Concluded license field […] 8.5.2 Intent

Here, the intent is for the SPDX document creator to analyze the License Information in File (8.6) and other objective information, e.g., “COPYING FILE,” along with the results from any scanning tools, to arrive at a reasonably objective conclusion as to what license governs the file.

https://spdx.github.io/spdx-spec/v2.3/file-information/#85-concluded-license-field https://spdx.github.io/spdx-spec/v2.3/document-creation-information/#68-creator-field

pietroalbini commented 2 years ago

Sounds good @silverhook, thanks for the prompt feedback! I updated the PR accordingly.