DataDog / guarddog

:snake: :mag: GuardDog is a CLI tool to Identify malicious PyPI and npm packages
https://securitylabs.datadoghq.com/articles/guarddog-identify-malicious-pypi-packages/
Apache License 2.0
585 stars 43 forks source link

Null result messages are dropped by metadata analysis #415

Closed ikretz closed 2 months ago

ikretz commented 2 months ago

Some metadata scanners may return messages even in the case of a null result. DeceptiveAuthorDetector is an example of this behavior:

# guarddog/analyzer/metadata/deceptive_author.py:69

if len(emails) == 0:
    # No e-mail is set for this package, hence no risk
    return False, "No e-mail found for this package"

However, Analyzer.analyze_metadata() drops all such messages:

# guarddog/analyzer/analyzer.py:121

rule_matches, message = self.metadata_detectors[rule].detect(info, path, name, version)
results[rule] = None
if rule_matches:
    issues += 1
    results[rule] = message

We should keep all scanner messages regardless of findings.

Alternately, we could explicitly adopt a convention that scanners should only return messages with non-null results. This would let us simplify the Analyzer.analyze_metadata() interface: a return value of None means no findings and a str means findings with the given message.

sobregosodd commented 2 months ago

Alternately, we could explicitly adopt a convention that scanners should only return messages with non-null results @ikretz this is the default behaviour. But instead of using the sole string as you suggest, it uses an extra boolean flag ruleMatches for that purpose, if false then message is discarded.

in: guarddog/analyzer/metadata/detector.py, it reads:

# returns (ruleMatches, message)

I don't see it as a bug TBH

ikretz commented 2 months ago

Agreed, as discussed, this is a case of a scanner returning a message that it knows will be dropped by the analyzer. It is not a bug. We might consider explicitly documenting this behavior of Analyzer somewhere.