HTTPArchive / httparchive.org

The HTTP Archive website hosted on App Engine
https://httparchive.org
Apache License 2.0
334 stars 42 forks source link

Technology data smaller than previous months #919

Closed tunetheweb closed 2 months ago

tunetheweb commented 2 months ago

https://lookerstudio.google.com/reporting/1jh_ScPlCIbSYTf2r2Y6EftqmX9SQy4Gn/page/2JBdB

image

Investigating the data it looks like WebMail, Email, Hosting, and CRM are some of those most affected.

And looking at the first 2 it looks like Wappalzyer uses MX DNS records for those and for the others SOA DNS records can be used.

@pmeenan you said you changed some of the DNS lookups so might know what's caused this?

tunetheweb commented 2 months ago

Looks like NS and TXT records are also used by Wappalyzer.

pmeenan commented 2 months ago

Looks like when I changed the metrics collection to run in parallel, wappalyzer relied on one of those metrics to see what the document hostname was (to do the DNS lookups). Should be a quick fix to get the hostname synchronously first if it isn't set yet.