chorsley / python-Wappalyzer

Python driver for Wappalyzer, a web application detection utility.
GNU General Public License v3.0
309 stars 122 forks source link

Fixed Technologies Update #66

Closed shaunpud closed 1 year ago

shaunpud commented 2 years ago

Supporting new format and resolving #65

tristanlatr commented 2 years ago

Thanks a lot @shaunpud !

tristanlatr commented 2 years ago

It looks like there is less results with the updated technologies.json --update argument than without. I think this is linked to #63 . Also I'm not sure about the performance? It looks like it's slower with --update no? maybe the file is much more large and there are a lot more of regular expressions to compile.

shaunpud commented 2 years ago

The update is going to be much slower considering it needs to iterate over 27 individual files now.

$ cat Wappalyzer/data/technologies.json | jq | wc -l
18448
$ cat ~/.python-Wappalyzer/technologies.json | jq | wc -l
41031

I've never used this library before but I can see it always reverts back to the old technologies.json file, unless you specify the location of where the new updated file is stored, eg: wappalyzer = Wappalyzer.latest(technologies_file='/home/user/.python-Wappalyzer/technologies.json').

Maybe I'll have a look at this when I have some free time, but the above does work for the time being and I can see new technology detections on https://example.com, compared to the old technologies file, for example.

tristanlatr commented 2 years ago

The update is going to be much slower considering it needs to iterate over 27 individual files now.

I'm not talking about the update itself, but the analysis process, it looks like it's more slower.

tristanlatr commented 2 years ago

I'm working on fixing #63, then I'll look into merging this pull request. If it's the JSON loading that is taking so much time, we could think of adopting https://github.com/ijl/orjson and fallback to standard library when it fails.

tfbecker commented 2 years ago

Hi tristanlatr it would be super great if you fixed it. I am trying to merge the files now manually myself but even this is hard.

brandonscholet commented 1 year ago

bump

brandonscholet commented 1 year ago

I implemented a pull of the newest technologies and from the release in my project. I'm sure one could even pull the current and if it doesn't parse as Jason, download the latest release file instead

https://github.com/brandonscholet/wappybird

tristanlatr commented 1 year ago

I’ll look at fixing this issue and publish new version in the coming weeks