fsfe / reuse-docs

REUSE recommendations, tutorials, FAQ and specification
https://reuse.software
19 stars 20 forks source link

Spec: How to handle submodules #36

Closed carmenbianca closed 1 year ago

carmenbianca commented 5 years ago

See also: https://github.com/fsfe/reuse-tool/issues/29

The spec currently doesn't really cover the scenario of having a Git submodule. My instinct is to ignore them, but I'm not sure if this is the correct approach. Because the submodule could initially be REUSE-compliant, but then it stops being compliant, possibly indirectly meaning that neither is your project.

Alternatively, if someone included a carbon copy of a REUSE-compliant project in a subdirectory (i.e., not a submodule), the LICENSES directory and .reuse/dep5 file of that carbon copy would not be detected. I am not sure if we should support this scenario, though, because it's kind of esoteric.

silverhook commented 5 years ago

This is why I think we need to think outside the repo box, and why SPDX’ definition of a “Package” could come in handy.

IMO, we should concentrate on what’s being published by the person in charge of the package/repo/tarball/…

In the case a repo wants to be REUSE compliant, if the Git submodule in question is a repository under control of the person who is in charge of the “main” repo, they should fix it in both repos. If it’s not, they should (ask) upstream (to) fix. (i.e. ignore the subrepo)

In the case from that repo and submodule(s) a source code tarball is created and that one should be REUSE compliant, the repo itself is not of direct importance, and its submodule even less so. But then they should make sure that everything in the tarball is REUSE compliant – what got in there from the original repo, all its submodules and also all extra added files of whichever origin (1st or 3rd party). (i.e. ignore the subrepo)

carmenbianca commented 5 years ago

But then they should make sure that everything in the tarball is REUSE compliant – what got in there from the original repo, all its submodules and also all extra added files of whichever origin

I principally like this, but I think that this complicates REUSE by a lot. It's the difference between:

and

Moreover it's much more technologically challenging to alter the structure of the resulting tarball, rather than the structure of the repo. Anybody with very slight technical skill knows how to manipulate files in a directory (the repo). Not everybody knows how to alter the build process to change what files go into a tarball and how.

So I'm kind of stuck on this, because REUSE needs to actually be easy to implement if it wants to gain any traction. And the easiest target is the repo, not the tarball.

silverhook commented 5 years ago

@carmenbianca I see your concern, but there is a reason why SPDX is looking at different types of Packages.

If we want to limit ourselves to VCS repositories (and its forges), we should acknowledge we are tackling the issue from only one specific front (and Git is just one of VCS used). If we do that, we should make double-sure to explicitly state this limitation and work with SPDX and (other) tooling projects, including build systems, on how to reuse REUSE data in final source code distribution.

If, though, we want to cover source code regardless of in which step it is in the development and what its system of choice for development is (e.g. some still contribute .patch files through mailing lists, or just don’t use VCS), then we should take that into account.

In any case, I would rather not have a helper tool direct and limit the scope of the specification. As cool as the reuse CLI tool is, it is just a tool to help people comply with the spec.

IMHO we should keep the Spec wide-reaching, but limit the Tutorial and Tool (initially) to just repos.

carmenbianca commented 5 years ago

If we want to limit ourselves to VCS repositories (and its forges), we should acknowledge we are tackling the issue from only one specific front (and Git is just one of VCS used).

I don't think this characterisation is entirely accurate. It's not exactly limiting oneself to VCS, but to "the canonical source", or "the development source", instead of the tarball or package, which is a derivative of the aforementioned source.

IMHO we should keep the Spec wide-reaching, but limit the Tutorial and Tool (initially) to just repos.

This makes some sense to me. Technically there is nothing that prevents the spec from applying to tarballs and repos alike. But even though the mechanics of REUSE compliance are the same, the way to get there differs a lot between the two.

But I suppose I have two unrelated questions:

  1. What, exactly, should the REUSE spec apply to before one can consider onself compliant? The repo or the tarball? Or both?

  2. How do we convey the idea of making a tarball compliant when there are a thousand ways in which to generate tarballs?

If the answer to question 1 is either the repo or both, then we probably still need to handle submodules somehow. Possibly.

silverhook commented 5 years ago

I think that if someone wants to have REUSE-compliant both their development repo as well as source code tarballs, we should not prevent them from it.

One use case I see is to have packages in NPM, PyPI, etc. scripting language repositories also REUSE-compliant.

  1. What, exactly, should the REUSE spec apply to before one can consider onself compliant? The repo or the tarball? Or both?

I would take the molecular approach, where for a “project” each repo/tarball/package/… (i.e. Package according to SPDX) is a separate object to be considered REUSE compliant or not.

That would also ease adoption, as if a Git repo is REUSE compliant, but its source tarball or its NPM package is not, it does not take away the REUSE compliance of the project’s repo.

Also, if someone where to e.g. create source packages (tarballs or otherwise) and made sure that they were REUSE compliant, but the upstream repository was not (perhaps the project is dead or not interested), those packages/tarballs would also still be compliant and provide additional value to its downstream.

  1. How do we convey the idea of making a tarball compliant when there are a thousand ways in which to generate tarballs?

I do not see the problem here. Same rules apply to files whether in a repository or in an archive. In the end it, when you clone it to your disk or unarchive it, you are left with files (and directories) which need to include appropriate license and copyright info.

carmenbianca commented 5 years ago

I would take the molecular approach, where for a “project” each repo/tarball/package/… (i.e. Package according to SPDX) is a separate object to be considered REUSE compliant or not.

+1. Does this need mentioning somewhere?

  1. How do we convey the idea of making a tarball compliant when there are a thousand ways in which to generate tarballs?

I do not see the problem here. Same rules apply to files whether in a repository or in an archive. In the end it, when you clone it to your disk or unarchive it, you are left with files (and directories) which need to include appropriate license and copyright info.

Technically yes, but the steps to get there are more difficult. Is this just a case of "figure it out yourself"?

silverhook commented 5 years ago

I would take the molecular approach, where for a “project” each repo/tarball/package/… (i.e. Package according to SPDX) is a separate object to be considered REUSE compliant or not.

+1. Does this need mentioning somewhere?

I think the FAQ would be a good target. IMHO it can already be understood that way in the Spec, especially if we take on the SPDX Package definition.

Technically yes, but the steps to get there are more difficult. Is this just a case of "figure it out yourself"?

I think it’s more a case of “figure it out yourself”. Or rather, the Tutorial can easily concentrate on repos, and if people come ask, we can write a separate short tutorial or question in the FAQ to tackle this and other use cases.

carmenbianca commented 5 years ago

Seems good to me. Then back on topic to the thread: Should the spec deal with submodules somehow, or just ignore the topic and leave it to the tool to deal with them?

The tool could do three things, I think:

One of these could be the default behaviour, and the other two could be added as optional flags.

carmenbianca commented 5 years ago

Correct, the second option is the most logical behaviour, and the current behaviour of the tool.

silverhook commented 5 years ago

As I said, I would say that submodules are not something the Spec itself needs to handle.

mxmehl commented 5 years ago

+1 to the SPDX approach of defining a package and what @silverhook wrote in his first post here.

Regarding the default treatment of submodules, I tend towards ignoring them, but mentioning them in the output explicitely. Otherwise, it would be highly demotivating for larger projects using a lot of submodules to become REUSE compliant. However, I like the optional flags, especially for the recursive check.

arcturus140 commented 5 years ago

I also think that the Spec should not handle submodules. If external projects being part of a project it may contain licencing information. This must not necessarily be a VCS submodule. Can dep5 be used to locate the license text?

carmenbianca commented 5 years ago

Not really. DEP5 only really marks the copyright of files.

As far as the spec contains, the directory project that contains submodules is a single Project/Package, and should be checked as such. A project does not support having multiple LICENSES/ directories or .reuse/dep5 files. And the spec very likely won't be altered to support such a configuration either, because it's a bit of an esoteric requirement.

Instead, you can do the following things:

mxmehl commented 4 years ago

From recent discussions, the summary seems to be:

  1. Let the spec ignore submodules, along the lines of "each submodule is understood as a separate project"
  2. Add to the tool: a) information about ignored submodules (amount, path), b) optional --recursive flag if a project owner wants to check whether the submodules themselves are REUSE compliant
mxmehl commented 3 years ago

The tool command is already there, and it automatically excludes submodules by default.

However, we need to add this in the spec, ideally under this part:

Each file in the Project MUST have Copyright and Licensing Information associated with it, except the following files:

carmenbianca commented 2 years ago

The chosen language should be broad enough to also include Meson subprojects → https://github.com/fsfe/reuse-tool/pull/496

mxmehl commented 2 years ago

@carmenbianca Hm, so you think the current proposal in #99 does not match this criteria? If not, how should we express it?