enthec / webappanalyzer

This project aims to maintain Wappalyzer technologies
GNU General Public License v3.0
243 stars 53 forks source link

Actual pattern validation? #9

Open kingthorin opened 1 year ago

kingthorin commented 1 year ago

Is your feature request related to a problem? Please describe. As part of the CI workflow for PRs (etc) would it be possible to validate the regex patterns or dom selectors ?

In the past we've found that upstream of AliasIO we encountered invalid regex patterns added to the technology files, or invalid selectors.

The two "normal" cases seemed to be:

There are plenty of other things that can make a regex or dom selector invalid, it would be good to catch and fix these early.

Describe the solution you'd like I believe this could be added to the existing Python based validation. In Java a pattern can be compiled (Pattern.compile(String)) at which point an exception would be thrown if invalid. We also came up with something similar for DOM selectors. I assume something similar can be done with Python.

Describe alternatives you've considered

Additional context Not sure what else to say here. Mainly I was thinking that catching potential errors as close to introduction as possible would be the easiest way to address/prevent them.

enthec-opensource commented 1 year ago

I believe this could be achieved by recursively iterating json objects and lists until you reach the string and do a regex.compile as you say, I'm going to go with that because that's as much validation as you can do for those fields

There is actually more validation we can do as specified in the schema, cpe has a very clear pattern. implies, requires and excludes could be matched aswell by a simple lookup in the specified json. pricing has limited string options.

kingthorin commented 1 year ago

Great!! 👍 Thanks for tackling this.

enthec-opensource commented 4 months ago

validating version & confidence tags

image

is this reliable? that fixed version doesnt seem to exist but looks like it would be useful to have (i understand its a fixed version, like "if this matches, force this version")

image

kingthorin commented 4 months ago

I believe the first undocumented examples are meant to be just that. Any match on the pattern assume that version string. IIRC I asked about that on the original project once upon a time.