Open stefan6419846 opened 1 week ago
Stefan: Thanks for the report! This is intentional because there is a lot of emptiness out there! Returning None for empty files has been a design choice to avoid creating dummy checksums.
FWIW, this predates SPDX and this library is not SPDX specific .... With this said, this has not been documented correctly and should be documented. I will keep this open until we document this alright.
And for the SPDX compatibility, this is handled when creating SPDX documents in ScanCode Toolkit at https://github.com/aboutcode-org/scancode-toolkit/blob/498467c1e5f677edc20e92b12910f16d771ccad3/src/formattedcode/output_spdx.py#L255 where we effectively return a SHA1 for empty files there. (And in the future, other checksums, hardcoded in a similar fashion).
Thanks for the explanation. Then this apparently is just a matter of missing/improper documentation.
Generating hashes for empty files will always return
None
, which is not documented and different from the usual hashing algorithms as well as contradicting the SPDX standard.Example:
The reason seems to be that https://github.com/aboutcode-org/commoncode/blob/878be6140deac30e2b95fb0fad9eb8feca015fc8/src/commoncode/hash.py#L38 does not use
msg is not None
, but basicallybool(msg)
, which isFalse
for empty inputs as well.Replacing the line with
(as well as replacing the same pattern in
sha1_git_hasher
) seems to fix this issue.