Closed rviscomi closed 1 year ago
No, the DNS code path isn't wired up in the agent (and doing it would probably be a fairly big change to the order the agent does things).
WPT does do a DNS pass for the base page though and logs the authoritative DNS for the origin:
base_page_dns_server: "any1.hostinger.com",
As well as a reverse-IP lookup on the origin IP (and any CNAME that the origin uses):
base_page_ip_ptr: "", base_page_cname: "",
If you're looking for hosting information, that's probably the more reliable way to do it.
Can open a WPT agent issue to do the DNS work before running wappalyzer but I'm not sure it will fly because they are currently done in parallel and it will slow down tests.
Thanks for looking into it. I suppose it can't hurt to open the issue on the WPT side to at least track the limitation and explore alternatives.
It's gross but I suppose it's possible that we can backstop some of these missing detections on the HA side using that host metadata. For example, in the Dataflow pipeline and test each page against Wappalyzer's DNS rules and emulate the Wappalyzer detections in the HAR.
I have most of the DNS logic implemented and hooked up but still not getting detections from Wappalyzer. I filed an issue to hopefully figure out if I'm holding it wrong
Should be fixed with the next crawl. Just merged. Here is a sample test. It uses Hostinger, Google mail and Amazon SES which are detected through DNS SOA, MX and TXT records.
"_detected": {
"Ecommerce": "Cart Functionality",
"Programming languages": "PHP,Java",
"UI frameworks": "Bootstrap 5",
"PaaS": "Amazon Web Services",
"JavaScript frameworks": "Vue.js 6995",
"Analytics": "Pinterest Conversion Tag,Microsoft Clarity 0.6.36,Google Analytics,Google Ads Conversion Tracking,Facebook Pixel 2.9.66,Cloudflare Browser Insights",
"RUM": "New Relic,Cloudflare Browser Insights",
"JavaScript libraries": "core-js 3.6.5",
"Reviews": "Trustpilot",
"Advertising": "Microsoft Advertising",
"Hosting": "Hostinger",
"Webmail": "Google Workspace",
"Email": "Google Workspace,Amazon SES",
"Tag managers": "Google Tag Manager",
"A\/B Testing": "Google Optimize",
"CDN": "Google Hosted Libraries,Cloudflare",
"Font scripts": "Google Font API"
},
"_detected_apps": {
"Cart Functionality": "",
"PHP": "",
"Java": "",
"Bootstrap": "5",
"Amazon Web Services": "",
"Vue.js": "6995",
"Pinterest Conversion Tag": "",
"New Relic": "",
"core-js": "3.6.5",
"Trustpilot": "",
"Microsoft Clarity": "0.6.36",
"Microsoft Advertising": "",
"Hostinger": "",
"Google Workspace": "",
"Google Tag Manager": "",
"Google Optimize": "",
"Google Hosted Libraries": "",
"Google Font API": "",
"Google Analytics": "",
"Google Ads Conversion Tracking": "",
"Facebook Pixel": "2.9.66",
"Cloudflare Browser Insights": "",
"Cloudflare": "",
"Amazon SES": ""
},
I also added the raw DNS for the origin to the har:
"_origin_dns": {
"cname": [
"www.hostinger.com.cdn.cloudflare.net."
],
"ns": [
"any2.hostinger.com.",
"any1.hostinger.com."
],
"mx": [
"1 aspmx.l.google.com.",
"10 aspmx3.googlemail.com.",
"10 aspmx2.googlemail.com.",
"5 alt2.aspmx.l.google.com.",
"5 alt1.aspmx.l.google.com."
],
"txt": [
"\"v=spf1 ip4:31.220.23.4 include:_spf.google.com include:amazonses.com include:_spf.hostedemail.com include:_spf.psm.knowbe4.com -all\"",
"\"apple-domain-verification=IyFbOUpTx9DUOFwL\"",
"\"mailru-verification: a8a9886e0072b036\"",
"\"google-site-verification=4EfGmYRIEIPWA_ACJsA5zFGUzzY1pa8Du2tiHb8EKuI\"",
"\"google-site-verification=MOjKs17dYrFXyEPndU4bK505my3D0dyC63-c5mvaNGU\"",
"\"nordpass-domain-verification=6b627232b00e4e9ea70693c7994f2d50\""
],
"soa": [
"any1.hostinger.com. dns.hostinger.com. 2021102522 10800 3600 604800 3600"
]
},
Will leave this open until we can verify after the crawl but DNS-based detections should be working now.
Great, thanks!
I'm working at Hostinger, a quick comment: you can distinguish if the site is hosted under Hostinger using HTTP headers:
platform: hostinger
or server: hcdn
.
Thanks @ton31337, once those changes land in Wappalyzer we should be able to automatically pick them up in our reporting.
To close out this thread, it looks like @pmeenan's change worked and we started seeing Hostinger data in the CWV Tech Report in August
@rviscomi is it possible to somehow extract the real website addresses (URLs) for a specific technology? We would like to identify the top slowest websites and do some performance analysis/improvements.
Yeah it's possible using BigQuery, for example the top 10 Hostinger sites with the slowest p75 TTFB:
DECLARE _YYYYMMDD DATE DEFAULT '2023-02-01';
WITH pages AS (
SELECT DISTINCT
root_page
FROM
`httparchive.all.pages`,
UNNEST(technologies) AS t
WHERE
date = _YYYYMMDD AND
t.technology = 'Hostinger'
),
crux AS (
SELECT
CONCAT(origin, '/') AS root_page,
p75_ttfb
FROM
`chrome-ux-report.materialized.metrics_summary`
WHERE
date = _YYYYMMDD
)
SELECT
root_page,
p75_ttfb
FROM
pages
JOIN
crux
USING
(root_page)
ORDER BY
p75_ttfb DESC
LIMIT
10
(12 GB processed)
Results:
root_page | p75_ttfb |
---|---|
https://medpress.az/ | 42300 |
https://aljens.info/ | 29100 |
https://boletimdopaddock.com.br/ | 22400 |
https://pvst.com.br/ | 20900 |
https://www.travelnthrill.com/ | 20300 |
https://mejora2.online/ | 19200 |
https://www.delsolinmobiliaria.com.ar/ | 18600 |
https://nimt.in/ | 17900 |
https://duplaimagemgastro.com.br/ | 17600 |
https://surveyandoffers.com/ | 17200 |
I'd be really curious to hear if this leads you to identifying any actionable issues. Keep me posted!
Thanks @rviscomi!
Hostinger exists as a supported technology in Wappalyzer, but we're not detecting any pages that use them.
Looking at the Wappalyzer source code, this technology seems to be using an unusual detection method (DNS):
https://github.com/wappalyzer/wappalyzer/blob/c4704aa76175a98d4212bcaa126ceb1473e51e8e/src/technologies/h.json#L888-L902
@pmeenan is that something that is supported by the WPT agent's driver code?
cc @ThierryA