Closed HugooB closed 1 year ago
Hi Hugo,
Thanks for reporting! It does sound abnormal, likely a confusion with the -
and the .
indeed. We'll look into fixing it
same applies to
Found 1 potentially malicious indicators in pdfminer.six version 20221105
typosquatting: This package closely ressembles the following package names, and might be a typosquatting attempt: pdfminer-six, pdfminer-six
Looked a bit into it: the problem arises outside this project. In the used PyPi top list, these false positives are mentioned with a -
instead of a .
(see ruamel.yaml
). You might fix that by making changes to either the original list or by including a separated list of known false positives..
Thanks! I think the easiest way is to normalize the package name passed as an input, and consider that scanning a.b.c
is actually going to scan a-b-c
. Sounds good to you?
Seems like the easiest way to fix it indeed! However, would that not increase the number of missed typosquats? E.g., with pdfminer.six
published on PyPi, a malicious actor can publish pdfminer-six
. With your proposed fix, this malicious package will not be detected, right?
Hi! I don't think that should be an issue since PEP 503 states:
This PEP references the concept of a “normalized” project name. As per PEP 426 the only valid characters in a name are the ASCII alphabet, ASCII numbers, ., -, and . The name should be lowercased with all runs of the characters ., -, or replaced with a single - character.
Additionally:
Repositories MAY redirect unnormalized URLs to the canonical normalized URL (e.g. /Foobar/ may redirect to /foobar/), however clients MUST NOT rely on this redirection and MUST request the normalized URL.
You'll notice PyPi does redirect all of these to the same site:
But more importantly, PIP does not rely on this redirection and gets the normalized URL thanks to pip._internal.utils.packaging
. These utilities can be reused by importing packaging
and normalizing the name before running the typosquat algorithm.
Great job @QuinceyJames , thanks for the detailed information!
Will be released shortly, thanks all!
Released as part of v0.1.10 https://github.com/DataDog/guarddog/releases/tag/v0.1.10
Hi there, first off all: awesome tools you guys made! :tada: Second, I encountered the following output when scanning a
requirements.txt
file:I do get why the second indicator is found, but the first one confuses me:
The package name (also installed on my machine) is
ruamel.yaml
. There is no package namedruamel-yaml
in either my requirements nor on PyPi. Did something went from with the dots in the package name? Or is it because this package is listed in your typosquatting list asruamel-yaml
?Thanks!