DataDog / guarddog

:snake: :mag: GuardDog is a CLI tool to Identify malicious PyPI and npm packages
https://securitylabs.datadoghq.com/articles/guarddog-identify-malicious-pypi-packages/
Apache License 2.0
611 stars 44 forks source link

"failed to run rule potentially_compromised_email_domain: Invalid version [...]" when using Recent "packaging" Package #389

Open cedricvanrompay-datadog opened 4 months ago

cedricvanrompay-datadog commented 4 months ago

The rule potentially_compromised_email_domain uses version.parse (with versioncoming from https://github.com/pypa/packaging/ ) on all versions of a PyPI package

https://github.com/DataDog/guarddog/blob/dcc98d70cc357b0d7e68485e2df4d8404605f300/guarddog/analyzer/metadata/pypi/potentially_compromised_email_domain.py#L35

Now, https://github.com/pypa/packaging/releases/tag/22.0 removed support for legacy version identifiers (see changelog), causing version.parse to raise an error when trying to sort the versions of a package that has weird versions like https://pypi.org/project/pytz/2004d/:

Python 3.12.2 (main, May 17 2024, 12:48:02) [Clang 15.0.0 (clang-1500.3.9.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import packaging
>>> packaging.__version__
'24.0'
>>> packaging.version.parse("2004d")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/cedric.vanrompay/.pyenv/versions/3.12.2/lib/python3.12/site-packages/packaging/version.py", line 54, in parse
    return Version(version)
           ^^^^^^^^^^^^^^^^
  File "/Users/cedric.vanrompay/.pyenv/versions/3.12.2/lib/python3.12/site-packages/packaging/version.py", line 200, in __init__
    raise InvalidVersion(f"Invalid version: '{version}'")
packaging.version.InvalidVersion: Invalid version: '2004d'
>>>

Right now GuardDog's poetry.lock has the version for packaging set to 21.3 so there is no error:

➜  guarddog git:(v1.10.0) ✗ poetry run python
Python 3.11.1 (main, Apr  9 2023, 11:26:24) [Clang 14.0.0 (clang-1400.0.29.202)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import packaging
>>> packaging.__version__
'21.3'
>>> packaging.version.parse("2004d")
<LegacyVersion('2004d')>
>>>

But this version is not constrained by pyproject.toml so next time someone runs poetry update this is going to break GuardDog. Also for some reason my team ended up with an installation of GuardDog using a recent version of packaging and so we got hit by this bug.