chorsley / python-Wappalyzer

Python driver for Wappalyzer, a web application detection utility.
GNU General Public License v3.0
309 stars 122 forks source link

dom key in tenhonologies.json #63

Closed ruNickStone closed 2 years ago

ruNickStone commented 3 years ago

Hello,

Now technologies.json have a new key "dom", when will you support it?

tristanlatr commented 3 years ago

As soon as someone makes a pull request adding support for that new feature.

Please feel free to do so :-)

tristanlatr commented 3 years ago

Does it currently fails to parse the technologies.json file ?

ruNickStone commented 3 years ago

Does it currently fails to parse the technologies.json file ?

https://github.com/AliasIO/wappalyzer/issues/4419

tristanlatr commented 2 years ago

I don't think python-Wappalyzer currently fails to parse the latesttechnologies.json file. I don't understand why you are referring to this issue.

I think python-Wappalyzer is missing some detections because it simply ignores the dom key.

If I understand correctly, the dom key is a CSS selector (or a list of selectors, or a dict of selectors to a dict of str (exists , text, properties or attributes) to str (the match) ).

It's defined by the following json schema:

"dom": {
        "oneOf": [
          {
            "type": "array",
            "items": {
              "$ref": "#/definitions/non-empty-non-blank-string"
            }
          },
          {
            "$ref": "#/definitions/non-empty-non-blank-string"
          },
          {
            "type": "object",
            "additionalProperties": false,
            "patternProperties": {
              "^.+$": {
              }
            }
          }
        ]
      },

This looks like an important thing to add to python-Wappalyzer. Any contributions would be very welcome. The JS code can be found here: https://github.com/AliasIO/wappalyzer/blob/master/src/drivers/npm/driver.js#L212 .

tristanlatr commented 2 years ago

python-Wappalyzer currently parses the technologie fields: 'headers', 'meta', 'url', 'html', 'scripts' and 'implies'. I see that a lot of the new technolgies are added/updated using this 'dom' key. Anyone want to help out adding this feature?