GitGuardian / ggshield

Find and fix 400+ types of hardcoded secrets and 70+ types of infrastructure-as-code misconfigurations.
https://gitguardian.com
MIT License
1.62k stars 143 forks source link

Make patch parser truly incremental #799

Closed agateau-gg closed 9 months ago

agateau-gg commented 9 months ago

Description

This PR is the last one of the patch parser refactoring. It changes the way the patch parser reads commits: instead of reading the modifications all at once, it batches them by group of 20 files by default, using git show --raw -z -m --patch $sha -- $file1 $file2 (...).

Subtlety

There is one subtlety though: if a file is renamed, then it's important that the call to git show includes both the old and the new paths, otherwise the file is not correctly categorized: it is identified as new instead of renamed. This causes issues down the line. This causes some tests to fail after the first commit of the branch.

This subtlety required reworking the way we list files: in the previous iteration CommitInformation listed only the path and its status, using the --name-status option of git show, but that does not provide information about renames. With this branch, CommitInformation now gets headers formatted the same way Commit gets them, giving it the required information. This also allows sharing the parsing code between Commit and CommitInformation. To do so more easily, they both use a new module commit_utils which contains the parsing code.

Review

Another large PR, sorry for that. Again best reviewed commit by commit.

codecov-commenter commented 9 months ago

Codecov Report

Attention: 7 lines in your changes are missing coverage. Please review.

Comparison is base (2b86fe2) 91.97% compared to head (6afd8d4) 92.01%.

Files Patch % Lines
ggshield/core/scan/commit_utils.py 93.20% 7 Missing :warning:

:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #799 +/- ## ========================================== + Coverage 91.97% 92.01% +0.03% ========================================== Files 156 157 +1 Lines 6553 6585 +32 ========================================== + Hits 6027 6059 +32 Misses 526 526 ``` | [Flag](https://app.codecov.io/gh/GitGuardian/ggshield/pull/799/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=GitGuardian) | Coverage Δ | | |---|---|---| | [unittests](https://app.codecov.io/gh/GitGuardian/ggshield/pull/799/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=GitGuardian) | `92.01% <95.23%> (+0.03%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=GitGuardian#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.