CycloneDX / license-scanner

Utility that provides an API and CLI to identify licenses and legal terms
Apache License 2.0
43 stars 8 forks source link

directory scans error on non-textual files and file too large #5

Closed markstur closed 1 year ago

markstur commented 1 year ago

Copied over from https://github.com/IBM/license-scanner/issues/30 issue by atharv-phadnis

Hello,

We were trying to use the tool for directory-level scans (using --dir) over a bunch of cloned repositories. For instance, we tried scanning gitea, it results into following:

$ license-scanner --dir gitea/ Error: failed to normalize data: invalid input text with control characters

We had a similar observation on a few more directories containing some non-textual files such as UI assets, binaries, etc.

Will it be possible to get a Warning for such file occurrences, and those files being ignored, and the scanner continuing to scan the remaining files? Or perhaps a command-line argument to set such a behavior by the tool?


Hey @markstur, thanks for the prompt reply.

Tested your workaround, seemed to be sorting the issue for now. Also ran across another issue with similar outcome: Error: file too large (4986500 > 1000000)

I tried changes similar to what you suggested for the earlier issue, like so:

diff --git a/identifier/identifier.go b/identifier/identifier.go
index 4750fa7..7bb47bd 100644
--- a/identifier/identifier.go
+++ b/identifier/identifier.go
@@ -109,7 +109,8 @@ func IdentifyLicensesInFile(filePath string, options Options, licenseLibrary *li
                return IdentifierResults{}, err
        }
        if fi.Size() > 1000000 {
-               return IdentifierResults{}, fmt.Errorf("file too large (%v > 1000000)", fi.Size())
+               Logger.Errorf("file too large (%v > 1000000)", fi.Size())
+               return IdentifierResults{}, nil
        }

        b, err := ioutil.ReadFile(filePath)

Could you confirm if this is the right way of handling the problem, or should it have been something else? And also if it is possible to incorporate this change as well?