crocs-muni / sec-certs

Tool for analysis of security certificates and their security targets (Common Criteria, NIST FIPS140-2...).
https://sec-certs.org
MIT License
12 stars 8 forks source link

Improve keywords search properties #255

Closed J08nY closed 1 year ago

J08nY commented 2 years ago

I am worried that currently when extracting keywords we overmatch some. For example, we might match a DH keyword as well a ECDH keyword for a document that has a "This thing implements ECDH as well as some other stuff." string in it.

This is due to the way we do regex matching. It is done depth first over all of the rules, where each rule has the REGEXEC_SEP = r"[ ,;\]”)(]" regex separator appended to it (but not prepended). This means that we are fine and DH will not match "This thing has DHE in it." but it will match "This thing has ECDH in it.", which is an issue.

We need to think a bit more about the strategy of extracting keywords and what we are achieving with the REGEXC_SEP and whether a better solution exists.