fsfe / reuse-tool

reuse is a tool for compliance with the REUSE recommendations.
https://reuse.software
412 stars 149 forks source link

Incorrect SPDX license parsing #1105

Closed szymonmaszke closed 1 week ago

szymonmaszke commented 1 week ago

Bug

Parsing gives false positives when SPDX-License-Identifier appears twice in the document (or so it seems) even when it is not a part of header/could not be considered licensing information.

MVC from GitHub Actions Workflow:

# SPDX-FileCopyrightText: © 2024 foo <https://github.com/bar>
#
# SPDX-License-Identifier: Apache-2.0

---
run: |
  echo "# SPDX-FileCopyrightText: © 2024 GitHub <https://github.com>" > .gitignore
  echo "#" >> .gitignore
  echo "# SPDX-License-Identifier: CC0-1.0" >> .gitignore
  echo "" >> .gitignore

which gives:

reuse.extract - ERROR - Could not parse 'CC0-1.0"'
reuse.extract - ERROR - 'FILE_XYZ.yml' holds an SPDX expression that cannot be parsed, skipping the file

reuse --version is 5.0.2.

Additional info

Workaround

Do not put anything after the SPDX-License-Identifier: <LICENSE> if possible, e.g.:

# SPDX-FileCopyrightText: © 2024 foo <https://github.com/bar>
#
# SPDX-License-Identifier: Apache-2.0

run: >
  echo '# SPDX-FileCopyrightText: © 2024 GitHub <https://github.com>
  #
  # SPDX-License-Identifier: CC0-1.0
  #' > .gitignore

or (probably) use .file.license to license the file

carmenbianca commented 1 week ago

Hi @szymonmaszke ! This is a known issue. The workaround is documented here:

https://reuse.software/faq/#override-info

In your case:

# SPDX-FileCopyrightText: © 2024 foo <https://github.com/bar>
#
# SPDX-License-Identifier: Apache-2.0

# REUSE-IgnoreStart
---
run: |
  echo "# SPDX-FileCopyrightText: © 2024 GitHub <https://github.com>" > .gitignore
  echo "#" >> .gitignore
  echo "# SPDX-License-Identifier: CC0-1.0" >> .gitignore
  echo "" >> .gitignore
# REUSE-IgnoreEnd

I hope that helps. The tool is not smart enough to separate code from comments, because it needs to handle—effectively—every possible file.