GitGuardian / ggshield

Find and fix 400+ types of hardcoded secrets and 70+ types of infrastructure-as-code misconfigurations.
https://gitguardian.com
MIT License
1.65k stars 147 forks source link

Garancegourdel/refacto output handlers to remove scanable from result #905

Closed fnareoh closed 3 months ago

fnareoh commented 4 months ago

Context

Currently, for every Result we keep the entire file (more precisely Scannable object) in which the secret was found, and only at the very end used it to display the secrets and a bit of it's context. This can cause ggshield to run out of memory if a scan finds many secrets in large files.

What has been done

In #890 we first worked on unifying the make_matches functions in between json and text output. Unfortunately, due to a lack of test on the json output, a breaking change was introduced and work MR was revert. This MR adds it back, with tests and a fix (thank you so much @agateau-gg )!

Then the last 4 commits gradually replace the need for the full content when displaying and ultimatly removes it. In the process it removes a few incoherence's in the tests. Best reviewed by commits!

Validation

Test should still pass. When finding a lot of secrets in large files the memory consumption should augment much less.

PR check list

fnareoh commented 4 months ago

@agateau-gg There is only the last commit you have not yet reviewed (it removes completly the Scannable from the Result). I'm not sure what's going on with the benchmark test.

fnareoh commented 4 months ago

Sorry for the branches confusion and thank you for your patience! #908 addresses your fist comment.