We were trying to use the tool for directory-level scans (using --dir) over a bunch of cloned repositories. For instance, we tried scanning gitea, it results into following:
$ license-scanner --dir gitea/Error: failed to normalize data: invalid input text with control characters
We had a similar observation on a few more directories containing some non-textual files such as UI assets, binaries, etc.
Will it be possible to get a Warning for such file occurrences, and those files being ignored, and the scanner continuing to scan the remaining files? Or perhaps a command-line argument to set such a behavior by the tool?
Hey @markstur, thanks for the prompt reply.
Tested your workaround, seemed to be sorting the issue for now. Also ran across another issue with similar outcome:
Error: file too large (4986500 > 1000000)
I tried changes similar to what you suggested for the earlier issue, like so:
diff --git a/identifier/identifier.go b/identifier/identifier.go
index 4750fa7..7bb47bd 100644
--- a/identifier/identifier.go
+++ b/identifier/identifier.go
@@ -109,7 +109,8 @@ func IdentifyLicensesInFile(filePath string, options Options, licenseLibrary *li
return IdentifierResults{}, err
}
if fi.Size() > 1000000 {
- return IdentifierResults{}, fmt.Errorf("file too large (%v > 1000000)", fi.Size())
+ Logger.Errorf("file too large (%v > 1000000)", fi.Size())
+ return IdentifierResults{}, nil
}
b, err := ioutil.ReadFile(filePath)
Could you confirm if this is the right way of handling the problem, or should it have been something else? And also if it is possible to incorporate this change as well?
Hello,
We were trying to use the tool for directory-level scans (using
--dir
) over a bunch of cloned repositories. For instance, we tried scanning gitea, it results into following:$ license-scanner --dir gitea/
Error: failed to normalize data: invalid input text with control characters
We had a similar observation on a few more directories containing some non-textual files such as UI assets, binaries, etc.
Will it be possible to get a Warning for such file occurrences, and those files being ignored, and the scanner continuing to scan the remaining files? Or perhaps a command-line argument to set such a behavior by the tool?
Hey @markstur, thanks for the prompt reply.
Tested your workaround, seemed to be sorting the issue for now. Also ran across another issue with similar outcome:
Error: file too large (4986500 > 1000000)
I tried changes similar to what you suggested for the earlier issue, like so:
Could you confirm if this is the right way of handling the problem, or should it have been something else? And also if it is possible to incorporate this change as well?