aboutcode-org / scancode-toolkit

:mag: ScanCode detects licenses, copyrights, dependencies by "scanning code" ... to discover and inventory open source and third-party packages used in your code. Sponsored by NLnet project https://nlnet.nl/project/vulnerabilitydatabase, the Google Summer of Code, Azure credits, nexB and others generous sponsors!
https://github.com/aboutcode-org/scancode-toolkit/releases/
2.08k stars 538 forks source link

Scan freezes when scanning SharpVectors-1.8.2 library #3614

Open pddurr opened 10 months ago

pddurr commented 10 months ago

Description

I downloaded the package from GitHub to a local directory.

I run the scan several times with several excludes and -n -1, -n 1 and -n 2 option E:\scancode-toolkit-v32.0.8\scancode -clpeui -v -n -1 --ignore ".java", --ignore ".settings" --ignore ".csproj" --ignore ".Designer.cs" --ignore "*.resx" --json-pp e:\Libraries\SharpVectors-1.8.2.json e:\Libraries\SharpVectors-1.8.2\Source

Scan feezes when scanning an assembly.cs file or another cs file

How To Reproduce

Download SharpVectors-1.8.2 library from https://github.com/ElinamLLC/SharpVectors

Scan with: E:\scancode-toolkit-v32.0.8\scancode -clpeui -v -n -1 --ignore ".java", --ignore ".settings" --ignore ".csproj" --ignore ".Designer.cs" --ignore "*.resx" --json-pp e:\Libraries\SharpVectors-1.8.2.json e:\Libraries\SharpVectors-1.8.2\Source

System configuration

Win 10 Scancode-Toolkit 32.0.8

pombredanne commented 10 months ago

Thanks for the report. This is most bizarre but a bug alright.

pombredanne commented 10 months ago

There is something bizarre about the repo is that is contains pre-built binaries too. And is therefore rather large (120MB) for a single NuGet, but this should not be an issue Which version of Python do you use?

pddurr commented 10 months ago

Python 3.11.3150.0 is installed

pombredanne commented 10 months ago

The data array in https://raw.githubusercontent.com/ElinamLLC/SharpVectors/master/Source/SharpVectorModel/Compressions/Brotli/Dictionary.cs looks really suspicious FWIW or at least weird. @paulushub would you know why this is this way?

paulushub commented 10 months ago

@pombredanne That is the implementation provided by the Google engineers, and the reason is well documented. See the comments: https://github.com/google/brotli/blob/master/csharp/org/brotli/dec/Dictionary.cs