Garancegourdel/refacto output handlers to remove scanable from result

fnareoh commented 4 months ago

Context

Currently, for every Result we keep the entire file (more precisely Scannable object) in which the secret was found, and only at the very end used it to display the secrets and a bit of it's context. This can cause ggshield to run out of memory if a scan finds many secrets in large files.

What has been done

In #890 we first worked on unifying the make_matches functions in between json and text output. Unfortunately, due to a lack of test on the json output, a breaking change was introduced and work MR was revert. This MR adds it back, with tests and a fix (thank you so much @agateau-gg )!

Then the last 4 commits gradually replace the need for the full content when displaying and ultimatly removes it. In the process it removes a few incoherence's in the tests. Best reviewed by commits!

Validation

Test should still pass. When finding a lot of secrets in large files the memory consumption should augment much less.

PR check list

[ ] As much as possible, the changes include tests (unit and/or functional)
[ ] If the changes affect the end user (new feature, behavior change, bug fix) then the PR has a changelog entry (see doc/dev/getting-started.md). If the changes do not affect the end user, then the skip-changelog label has been added to the PR.

fnareoh commented 4 months ago

@agateau-gg There is only the last commit you have not yet reviewed (it removes completly the Scannable from the Result). I'm not sure what's going on with the benchmark test.

fnareoh commented 4 months ago

Sorry for the branches confusion and thank you for your patience! #908 addresses your fist comment.

GitGuardian / ggshield