anchore / vunnel

Tool for collecting vulnerability data from various sources (used to build the grype database)
Apache License 2.0
69 stars 25 forks source link

chore: move debian parser from vulnerability dict to dataclass #645

Open willmurphyscode opened 1 month ago

willmurphyscode commented 1 month ago

Previously, the parser was using a deep copy of a dict literal to make each new instance of the vulnerability it was emitting. Vunnel has since added a dataclass to represent vulnerabilities. Switch to using that dataclass.

See #644

willmurphyscode commented 1 month ago

It looks like the quality gate is failing because somehow there are a mix of Vulnerability dataclasses and dicts in vuln_records:

   File "/home/runner/work/vunnel/vunnel/src/vunnel/providers/debian/parser.py", line 553, in get
    vuln_record["Vulnerability"].Severity = "Unknown"
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'dict' object has no attribute 'Severity'
westonsteimel commented 1 month ago

It looks like the quality gate is failing because somehow there are a mix of Vulnerability dataclasses and dicts in vuln_records:

   File "/home/runner/work/vunnel/vunnel/src/vunnel/providers/debian/parser.py", line 553, in get
    vuln_record["Vulnerability"].Severity = "Unknown"
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'dict' object has no attribute 'Severity'

@willmurphyscode , could that be due to the "legacy" record path?

willmurphyscode commented 1 month ago

@westonsteimel that seems likely. I knew the Debian provider was downloading cached data. There must be a place where it's parsing that as dict instead of Vulnerability.

Do you think it's safe to change that path? I know there's some information we care about that's only available in the legacy cache, so I'm wondering if I should abandon this for the Debian provider in particular. But I can't think of a reason that writing those records back to the cache as Vulnerability would matter - the cache is just JSON if I recall correctly.

westonsteimel commented 1 month ago

It should be safe to do. Alex recently changed it so that the legacy cache is a sqlite db with exactly the vunnel result shape

willmurphyscode commented 1 month ago

This is waiting on #647 so that any differences in snapshots will be possible to review.

westonsteimel commented 1 month ago

Oh, one danger with python dataclass that we ran into with enterprise is that it can't handle new properties being added to the json. So, for instance, if we added a new property to the os schema (non breaking change) and published that and then consumed in proxy mode, old vunnel clients using dataclass would actually fail due to an additional unknown property being present.

So for enterprise specifically we've replaced most of the uses of dataclass with pydantic models; however, that is a rather weighty dependency to bring in to vunnel, and enterprise is stuck on v1 of pydantic for the foreseeable future. That won't be a hindrance for vunnel much longer though since vunnel will cease to be a dependency of enterprise in the next release

willmurphyscode commented 1 month ago

@westonsteimel does that mean we should hold off on this type of change until the next release?