fsfe / reuse-tool

reuse is a tool for compliance with the REUSE recommendations.
https://reuse.software
413 stars 150 forks source link

Generated files with code snippets including copyright information causing problems. #834

Open schmidtw opened 1 year ago

schmidtw commented 1 year ago

Hello, this is similar to #278 but in my case I have project documentation that includes automatically generated examples that we want copied verbatim and they include the copyright information in SPDX form automatically. Adding the wrappers can be a bit tedious and complex in some cases to maintain effectively & marking the file as excluded by .gitignore is also problematic - we want the file to be easy to update and maintain git.

Since I have the file marked in the dep5 file with the accurate information, it would extremely handy to be able to also tag it in that file as 'do not examine'.

mxmehl commented 1 year ago

I'm not sure I fully understood your case. Does this FAQ item and the following (about ignoring parts of a file) help?

schmidtw commented 1 year ago

This is a very simple example of a scenario that can cause trouble: https://github.com/xmidt-org/shared-go/#golang-ci-workflow-sample

Effectively we have actions that generate html code that we specifically want to check in & not exclude. In that code, we generate examples that include suggested SPDX headers to help encourage compliance with ease. But needing to add labels to each file that requires this runs into to a few different problems:

  1. For some formats (like JSON) this can be impossible.
  2. For any sizable amount of documentation that is auto generated and committed back as part of a repo, the need to manually restore the tags is tedious and error prone.
  3. Automation of inserting these tags may pose challenges of their own.
  4. From a compliance perspective, I want my org to embrace this workflow to the fullest; but the more tedious work there is, the larger the push-back on the entire effort. Friction yields failure to adopt, unfortunately.

We already have a pretty well though out dep5 file that describes the majority of how a project should be governed. The problem isn't lack of any copyright information in the file, it's simply confusing the tool. Providing a number of ways to handle difficult areas is one of the strengths reuse has. Extending it a bit to make dealing with generated code as easy as dealing with an image is valuable.

I'm aware that I can mark all the files in the .gitignore file, but that dilutes protections it brings because people just get in the habit of adding -f to force everything & then not thinking twice about the likely mistake they're going to make.

schmidtw commented 1 year ago

This is another example of where it would be ideal to be able to use dep5 to ignore the contents: https://github.com/xmidt-org/retry/blob/main/NOTICE since we already have an entry https://github.com/xmidt-org/retry/blob/47b368ed0b4df48b9d5586270aca593d52140c5b/.reuse/dep5#L6

(Just documenting use cases of where this is handy.)