chorsley / python-Wappalyzer

Python driver for Wappalyzer, a web application detection utility.
GNU General Public License v3.0
309 stars 122 forks source link

Recognize scripts which are inline HTML and not "active" due to GDPR #48

Open bch80 opened 3 years ago

bch80 commented 3 years ago

Hello,

I see some websites with scripts which are included inline like: <script type="text/plain" ... without src.

In Wappalyzer.py I found: self.scripts = [script['src'] for script in soup.findAll('script', src=True)]

so the inline scripts won't be regarded. When I try to change it into: self.scripts = soup.findAll('script', type='text/plain', src=True) no JS is recognized any more.

How would I be able to recognize inline technologies?

Script examples (Mautic + Matomo):

<script type="text/plain" data-cli-class="cli-blocker-script"  data-cli-script-type="non-necessary" data-cli-block="true"  data-cli-element-position="body">
    (function(w,d,t,u,n,a,m){w['MauticTrackingObject']=n;
        w[n]=w[n]||function(){(w[n].q=w[n].q||[]).push(arguments)},a=d.createElement(t),
        m=d.getElementsByTagName(t)[0];a.async=1;a.src=u;m.parentNode.insertBefore(a,m)
    })(window,document,'script','https://www.domain.de/m/mtc.js','mt');

    mt('send', 'pageview');
</script>

<!-- Matomo -->
<script type="text/plain" data-cli-class="cli-blocker-script"  data-cli-script-type="non-necessary" data-cli-block="true"  data-cli-element-position="body">
  var _paq = window._paq || [];
  /* tracker methods like "setCustomDimension" should be called before "trackPageView" */
  _paq.push(['trackPageView']);
  _paq.push(['enableLinkTracking']);
  (function() {
    var u="//stats.domain.de/";
    _paq.push(['setTrackerUrl', u+'matomo.php']);
    _paq.push(['setSiteId', '1']);
    var d=document, g=d.createElement('script'), s=d.getElementsByTagName('script')[0];
    g.type='text/javascript'; g.async=true; g.defer=true; g.src=u+'matomo.js'; s.parentNode.insertBefore(g,s);
  })();
</script>
<!-- End Matomo Code -->
tristanlatr commented 3 years ago

Hello, Thanks for the report.