chorsley / python-Wappalyzer

Python driver for Wappalyzer, a web application detection utility.
GNU General Public License v3.0
309 stars 122 forks source link

Fix typo #36

Closed RomaLash closed 3 years ago

RomaLash commented 3 years ago

Fix typo ("script" become "scripts", cause this name we have in original technologies.json) This little typo didnt let parse some software (for example, JQuery)

coveralls commented 3 years ago

Coverage Status

Coverage decreased (-2.0%) to 84.774% when pulling 579606479a4f1ee78a3b6a6176fa540524b04587 on RomaLash:master into 7716333a25f3aff2b7a24097ed90dfbb7bbfc8b8 on chorsley:master.

RomaLash commented 3 years ago

Also I suggest to ignore implied techs with "\;confidence:50" Reason: if u try to analyze site with IIS (for example "https://arqaamcapital.com/english.aspx") u have an exception: "Caught exception KeyError: 'IIS\;confidence:50'", raised by function get_versions, cause _get_implied_technologies adds parasite tech: 'IIS\;confidence:50'.

tristanlatr commented 3 years ago

Thanks a lot, I did not see that non the tests

tristanlatr commented 3 years ago

I'm not sure we want to ignore totally the implied applications if they present a confidence number... May be we can add the technology only if the confidence if superior or equals to 50 ?

The underlying issue is that _get_implied_technologies is working with analyze() that both returns sets, so we cannot easily add any information to objects as its only strings.

So we'll have to make a choice, you propose to completely ignore the implied apps if confidence is present. What about we parse the confidence, and add the version only if it's superior or equals to 50 ?

Here is how we can recycle the version parsing regex to parse the confidence :

>>> re.search('\\\\?([^:]+):(.*)$', 'IIS\\;confidence:50').groups()
('IIS\\;confidence', '50')
>>> re.sub( '\\\\;.*', '', 'IIS\\;confidence:50')
'IIS'

Do you think you could change your code to something like : (untested)

                    for implie in self.technologies[tech]['implies']:
                        if 'confidence' not in implie:
                            _implied_technologies.add(implie)
                        else:
                            try:
                                m = re.search('\\\\?([^:]+):(.*)$', implie)
                                if m amd len(m.groups()) == 2:
                                    confidence = int(m.groups()[1])
                                    app_name = re.sub( '\\\\;.*', '', implie)
                                    if confidence >= 50:
                                        _implied_technologies.add(app_name)
                            except ValueError:
                                pass
RomaLash commented 3 years ago

Something like this? Like u suggested, add tech if confidence more or equal 50.

Maybe we can add some flag, like "ignore confidence" or it is unnecessary.. dunno.

tristanlatr commented 3 years ago

Thanks thanks great! Indeed better that what I proposed. One last thing, can you catch also AttributeError at the same time as ValueError? A call to re.search().groups() can raise ttributeError: 'NoneType' object has no attribute 'groups' if the regex didn't match

RomaLash commented 3 years ago

Done.

tristanlatr commented 3 years ago

Thanks !