Closed AlexMoening closed 1 month ago
Before I plumb the additional lookups, would this be most useful for the main page (where we have things like base_page_ip_ptr
) to see if the site itself is available over V6 or do you need it for every request?
We currently query for cname
, ns
, mx
, txt
, soa
, https
, and svcb
records for the document domain as part of running Wappalyzer and adding A and AAAA to those would be trivial (and I could add the raw DNS records for those to the all.pages
records).
Adding it to every request is also possible but will probably extend the crawl time a bit since we're not currently doing lookups for all of the resource origins on the page. Doable but I don't want to add it if it's not helpful.
OK, I think I got a dual-stack VPC set up for the agents to test from and it looks like IPv6 (and 4) are both working. I'll switch over to the new networks (and updated agents) for the October crawl.
I replaced the one-off agent that we use at https://webpagetest.httparchive.org/ so you can test and see what the results look like.
Here is a sample test I just ran for ip6only.me (which shows the correct routable v6 address for the agent).
Independent of actual connectivity, at a page-level (all.pages) the payload will include AAAA record results:
"_origin_dns": {
"aaaa": [
"2001:4810:0:3::78"
],
"ns": [
"ns2.hotnic.net.",
"ns.hotnic.net."
],
"soa": [
"ns.hotnic.net. hostmaster.hotnic.net. 2023112801 10800 3600 604800 10800"
],
"a": [],
"cname": [],
"mx": [],
"txt": [],
"https": [],
"svcb": []
},
At a request-level (all.requests), it will have the actual DNS lookup information from Chrome and, if IPv6 was used, v6 addresses for the connection information (looks like it is all parsed OK which was a concern).
"_ip_addr": "[2001:4810:0:3::78]",
"_dns_info": {
"secure": false,
"transactions_needed": [
{
"dns_query_type": "AAAA"
},
{
"dns_query_type": "A"
},
{
"dns_query_type": "HTTPS"
}
],
"results": {
"aliases": [
"ip6only.me"
],
"canonical_names": [
"ip6only.me"
],
"endpoint_metadatas": [],
"expiration": "13372361655791695",
"host_ports": [],
"hostname_results": [],
"ip_endpoints": [
{
"endpoint_address": "2001:4810:0:3::78",
"endpoint_port": 0
}
],
"text_records": []
}
},
I'll leave this open until we confirm that the crawl runs OK with the changes.
The October crawl completed successfully on the new dual-stack network and with the new records so closing this out.
Today WPT agent does not allow for easy IPv6 reporting.
Can we consider probing DNS for a few more record types specifically an A and AAAA record to at least see if a given host could be served over V6?
Ideally we would also allow agents to connect over v4 and v6 to derive v6 adoption across server/CDN types.