fsfe / reuse-docs

REUSE recommendations, tutorials, FAQ and specification
https://reuse.software
19 stars 20 forks source link

Documenting Licenses used when they only apply to part of the file. #132

Closed OneDeuxTriSeiGo closed 11 months ago

OneDeuxTriSeiGo commented 1 year ago

I have a project where sources are licensed under the GPL-3.0-or-later but documentation is generally licensed under CC-BY-SA-4.0. Some of that documentation is inline in the source files and a tool generates the resulting documentation.

I want to be able to express that any docs that are inline in the sources are licensed under CC-BY-SA-4.0 as it is part of the documentation but also licensed under GPL-3.0-or-later as it is part of the source code (and I'm fine with the docs being available under either license or any later revision of those licenses).

But importantly I want the source code (excluding the documentation comments) only licensed under GPL-3.0-or-later.

I can express this with a comment header/license notice in each file but it's not clear to me how to codify this in the SPDX-License-Identifier expressions.

I couldn't find any documentation regarding how to approach this with REUSE but I'd wager this is a fairly common situation and guidance on this could be useful to other people in the future.

silverhook commented 11 months ago

That is why we introduced snippets :)

https://github.com/fsfe/reuse-docs/blob/0913b0a83b36c161966be1c5e70c81bdadfb8a69/spec.md?plain=1#L167

OneDeuxTriSeiGo commented 11 months ago

That is why we introduced snippets :)

Oh awesome. I was not even aware this was a thing however it is certainly a bit verbose and awkward from what I can tell. It'd be ideal if you could specify grammars to tag in .reuse and then use a block at the top of the file to apply whatever tags you include in that block to any sections that match the grammar for the block.

Also will I need to redeclare the copyright tag for each documentation block or will it inherit any tags I give to the file?

silverhook commented 11 months ago

Oh awesome. I was not even aware this was a thing however it is certainly a bit verbose and awkward from what I can tell. It'd be ideal if you could specify grammars to tag in .reuse and then use a block at the top of the file to apply whatever tags you include in that block to any sections that match the grammar for the block.

We are working on renewing the .reuse part, but defining the start and end of snippets in an external file is quite finicky, so within REUSE we’d like to avoid it.

Snippet support is already in the SPDX specification (on which REUSE is based), so check out if this, if you really need it:

https://spdx.github.io/spdx-spec/v2.3/file-tags/#h3-snippet-tags-format https://spdx.github.io/spdx-spec/v2.3/snippet-information/#9.4

That said, if you rely on the snippet definition in an SPDX file, you would need to re-generate the SPDX file on every version, to make sure the tags point to the right lines …and the best way to do that so far is with REUSE snippet tags in the source code :sweat_smile:

Also will I need to redeclare the copyright tag for each documentation block or will it inherit any tags I give to the file?

Good question. I don’t know what the tool does, but given the logic behind REUSE that it’s easy to re-use code, the snippets should be self-contained, so if someone just copies the snippet with the tags, they have all the licensing info with it.

OneDeuxTriSeiGo commented 11 months ago

We are working on renewing the .reuse part, but defining the start and end of snippets in an external file is quite finicky, so within REUSE we’d like to avoid it.

  • when you edit the file, the lines shift, so you’d need to re-check and re-define the snippet locations in .reuse

  • if someone copied the file without also the (hidden) .reuse the license info of the snippet would be lost

Oh yeah I wasn't necessarily saying defining the start and end for each snippet but rather defining a grammar (i.e. the regex (?(?!\R\R)(\/\/\/.*\R)|(\/\/\/.*))+ matches any contiguous block of /// rust outer line doc comments) and then declaring a block of tags at the top of a file to apply the contents of the block to any text that matches the grammar in the rest of the file. i.e.

SPDX-SnippetsFromGrammarBegin: RUST_DOC_COMMENT
SPDX-tagname: <value>
...
SPDX-SnippetsFromGrammarEnd

where RUST_DOC_COMMENT is a regex defined somewhere in the repo (if not in .reuse).

I include a license header that states that documentation in the file is dual licensed but the code is not but it'd be awesome if I could get the SPDX file & associated automated tooling to properly reflect that without an egregious amount of repetition/line noise.

Snippet support is already in the SPDX specification (on which REUSE is based), so check out if this, if you really need it:

https://spdx.github.io/spdx-spec/v2.3/file-tags/#h3-snippet-tags-format https://spdx.github.io/spdx-spec/v2.3/snippet-information/#9.4

That said, if you rely on the snippet definition in an SPDX file, you would need to re-generate the SPDX file on every version, to make sure the tags point to the right lines …and the best way to do that so far is with REUSE snippet tags in the source code 😅

Yeah I figure if I can't find a more succinct way to handle it that's what I'll have to do.

Also will I need to redeclare the copyright tag for each documentation block or will it inherit any tags I give to the file?

Good question. I don’t know what the tool does, but given the logic behind REUSE that it’s easy to re-use code, the snippets should be self-contained, so if someone just copies the snippet with the tags, they have all the licensing info with it.

Noted.

silverhook commented 11 months ago

Unless I misunderstood you severely, I would still caution against what you are trying to do, because whatever practical benefit it might bring, if the license info is not self-contained it will eventually get lost.

And if you know a repo / package is set up so that a part of its license info is unreliable, you cannot trust the whole thing.

The harder it is for a random person and machine to figure out what license governs a specific file or line of code, the bigger this problem.

OneDeuxTriSeiGo commented 11 months ago

Hmmm yeah it's less than ideal for sure.

What I might be able to do instead would be to leave the license header text at the top of each file like I have it now but mark the files as SPDX-License-Identifier: GPL-3.0-or-later OR LicenseRef-OnlyDocs-CC-BY-SA-4.0 but then tag the generated documentation files as SPDX-License-Identifier: GPL-3.0-or-later OR CC-BY-SA-4.0.

That way it's clear:

and then the SBOMs should capture that when we run reuse spdx.

it's not the best solution but I don't think I'm making any glaring omissions that would break compliance?