Open thbde opened 2 years ago
With latest code using reuse spdx
I still get this:
FileName: ./gradlew
SPDXID: SPDXRef-dc243f792038baeebbca36717e0d2288
FileChecksum: SHA1: 0e59ccf04f8db22729ebef7ee39517a9e3a80c9d
LicenseConcluded: NOASSERTION
LicenseInfoInFile: Apache-2.0
FileCopyrightText: <text>Copyright © 2015-2021 the original authors.
Copyright © 2015-2021 the original authors.</text>
The correct one is from the file itself and the broken one is from .reuse/dep5
To note that if I add UTF-8 BOM to .reuse/dep5 then it fails with:
.reuse/dep5 has syntax errors
Traceback (most recent call last):
File "c:\program files\python\lib\site-packages\reuse\project.py", line 219, in _copyright
self._copyright_val = Copyright(fp)
File "c:\program files\python\lib\site-packages\debian\copyright.py", line 156, in __init__
self.__header = Header(paragraphs[0])
File "c:\program files\python\lib\site-packages\debian\copyright.py", line 666, in __init__
'input is not a machine-readable debian/copyright')
debian.copyright.NotMachineReadableError: input is not a machine-readable debian/copyright.reuse/dep5 has syntax errors
Traceback (most recent call last):
File "c:\program files\python\lib\site-packages\reuse\project.py", line 219, in _copyright
self._copyright_val = Copyright(fp)
File "c:\program files\python\lib\site-packages\debian\copyright.py", line 156, in __init__
self.__header = Header(paragraphs[0])
File "c:\program files\python\lib\site-packages\debian\copyright.py", line 666, in __init__
'input is not a machine-readable debian/copyright')
debian.copyright.NotMachineReadableError: input is not a machine-readable debian/copyright
.reuse/dep5 has syntax errors
Traceback (most recent call last):
File "c:\program files\python\lib\site-packages\reuse\project.py", line 219, in _copyright
self._copyright_val = Copyright(fp)
File "c:\program files\python\lib\site-packages\debian\copyright.py", line 156, in __init__
self.__header = Header(paragraphs[0])
File "c:\program files\python\lib\site-packages\debian\copyright.py", line 666, in __init__
'input is not a machine-readable debian/copyright')
debian.copyright.NotMachineReadableError: input is not a machine-readable debian/copyright
The following description can be reproduced via: https://github.com/thbde/reuse-utf8-dep5-issue
Assume that we use the dep5 file to declare the license state for a repository. Furthermore, the dep5 file contains utf-8 characters (or code points).
In that case, reuse will fail if we execute:
reuse download --all
And the reported errors are rather confusing:
The errors want to tell us:
The root cause (after quite some debugging) is the open call for the dep5 file: https://github.com/fsfe/reuse-tool/blob/60c0986bb24a3b482ee0527e9195f7c23cadb003/src/reuse/project.py#L216-L219
locale.getpreferredencoding()
which iscp1252
on windows)Copyright
class does somewhere deep inside quite some nested functions a(more precisely, here: https://salsa.debian.org/python-debian-team/python-debian/-/blob/278e016f5ed8d3ed4fa17d7c30b54c149d428808/lib/debian/deb822.py#L759 )
This implicitly reads the file descriptor (via an iterator that fd implements) and therefore we encounter a decoding issue. You can also reproduce that part via https://github.com/thbde/reuse-utf8-dep5-issue/blob/90f323b6b2af6b94fd3a461a986898b48ec5c6c9/call_open.py
Now, this error is not reported as such due to how reuse deals with the file analysis.
The shown error comes from here: https://github.com/fsfe/reuse-tool/blob/60c0986bb24a3b482ee0527e9195f7c23cadb003/src/reuse/report.py#L207-L210
result
is created by mapping a method to all files: https://github.com/fsfe/reuse-tool/blob/60c0986bb24a3b482ee0527e9195f7c23cadb003/src/reuse/report.py#L197And that method, a call to the constructor of
_MultiprocessingContainer
, just swallows the full error message in a_MultiprocessingResult
: https://github.com/fsfe/reuse-tool/blob/60c0986bb24a3b482ee0527e9195f7c23cadb003/src/reuse/report.py#L37-L46which uses the file path of the file that is currently analyzed.
In conclusion, I see two issues here: