Closed carla-gitguardian closed 5 months ago
Attention: 13 lines
in your changes are missing coverage. Please review.
Comparison is base (
70053de
) 91.89% compared to head (c717e25
) 91.81%.
Files | Patch % | Lines |
---|---|---|
ggshield/verticals/sca/output/json_handler.py | 23.52% | 13 Missing :warning: |
:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Hi @agateau-gg, In this MR I wanted to exclude some attributes from the dataclasses returned by py-gitguardian. I did not find how to do it however, so I ended up redefining the classes in ggshield without the unwanted attributes to handle the output schemas. Would you think of a better way to do it?
Hi @agateau-gg, In this MR I wanted to exclude some attributes from the dataclasses returned by py-gitguardian. I did not find how to do it however, so I ended up redefining the classes in ggshield without the unwanted attributes to handle the output schemas. Would you think of a better way to do it?
Just read the existing code and I have questions :)
In SCAJsonOutputHandler._process_scan_*_impl()
methods: why do we deserialize the dict and then dump the object as JSON? It seems to me it would be simpler to run json.dumps()
on the dict.
If there is an interest in serializing then deserializing, can you add a comment explaining it?
Assuming there is an interest in doing it, I think we can avoid basically duplicating all models by changing the code to use schema.to_dict()
instead of schema.dumps()
so that we have dictionaries and then remove the keys we don't want from said dictionaries.
Another point: something similar has already be done on the IAC side, have a look at ggshield/verticals/iac/output/schemas.py.
I'm not sure actually why it is serialized and then deserialized here, maybe @amascia-gg would have more clue?
As for IAC side, it had been done differently because schemas are exported from py-gitguardian for IaC whereas there are dataclasses for SCA. Maybe the change should directly occur in py-gitguardian to maintain better consistency? I don't know how much code would be impacted though.
As for IAC side, it had been done differently because schemas are exported from py-gitguardian for IaC whereas there are dataclasses for SCA. Maybe the change should directly occur in py-gitguardian to maintain better consistency? I don't know how much code would be impacted though.
I am not sure I understand what difference you are referring to. Is it because in py-gitguardian IaC schemas are defined like this:
IaCVulnerabilitySchema = cast(
Type[BaseSchema], marshmallow_dataclass.class_schema(IaCVulnerability, BaseSchema)
)
IaCVulnerability.SCHEMA = IaCVulnerabilitySchema()
Whereas SCA schemas are defined like this?
SCAIgnoredVulnerability.SCHEMA = cast(
BaseSchema,
marshmallow_dataclass.class_schema(
SCAIgnoredVulnerability, base_schema=BaseSchema
)(),
)
ie, if all that's missing is the SCAIgnoredVulnerabilitySchema
class, then yes, let's define them in py-gitguardian.
I guess it is yes! I'll define them in py-gitguardian and use them the same way we did in IaC. I'll also look into this potentially unnecessary serialization. Thanks for your help!
Hi, some news:
created_at
datetime. Removing it resulted in an error (as dump
expected a datetime, while the dict provided it as a string).load
method fails because of the additional fields (such as source_found
) that are not part of the schema, instead of ignoring themThere is no such issue for IAC because of the absence of datetime fields in the IAC schemas.
After spending some time trying to make it work, I decided to do as you suggested: discard the additional schemas and work with the dictionaries.
One minor nitpick. Not a blocker.
One thing to consider in the future is using JSON schemas to document and validate sca JSON outputs. I started doing so for secret outputs. It works like this:
* Schemas are defined in doc/schemas * They are loaded as fixtures here: https://github.com/GitGuardian/ggshield/blob/main/tests/conftest.py#L312-L337 * They are checked by tests, for example here: https://github.com/GitGuardian/ggshield/blob/main/tests/functional/secret/test_scan_path.py#L89
Ok that's great, I've made schemas for SCA, can you take a look?
I couldn't manage to import schema elements from other files, so I had to duplicate the locations_vulnerabilities
block in both schemas. It looks like additional setup is required for that: https://python-jsonschema.readthedocs.io/en/latest/referencing/
The autoignore feature modified SCA scan classes returned by py-gitguardian. The added attributes should not be kept in the output schemas of ggshield.