Closed rviscomi closed 4 years ago
There's lhr.audits['network-requests']
, which will at least give you a structured array of the network requests.
If the artifacts could be added to HTTPArchive, there is the LinkElements
artifact:
"LinkElements": [
{
"rel": "manifest",
"href": "https://100.exterkamp.codes/manifest.json",
"hrefRaw": "https://100.exterkamp.codes/manifest.json",
"hreflang": "",
"as": "",
"crossOrigin": null,
"source": "head"
}
],
and the ServiceWorker
artifact:
"ServiceWorker": {
"versions": [
{
"versionId": "0",
"registrationId": "0",
"scriptURL": "https://100.exterkamp.codes/sw.js",
"runningStatus": "running",
"status": "activated",
"scriptLastModified": 1547540209,
"scriptResponseTime": 1568661692.977256,
"controlledClients": [],
"targetId": "B1249BF04CA22F295084955F7F8AE1E2"
}
],
"registrations": [
{
"registrationId": "0",
"scopeURL": "https://100.exterkamp.codes/",
"isDeleted": false
}
]
},
check out report.audits['network-requests'].details.items
:)
The manifest should even be resourceType: 'Manifest'
.
that doesn't include the service worker URL, though, since the page itself isn't doing the network request for the file. #709 would/could do that, or we could put the service worker URL in the details of the service-worker
audit
I was too slow :)
If the artifacts could be added to HTTPArchive
probably won't happen, though, since the artifacts are large and growing without bound these days (and httparchive kind of already has its own artifacts it collects)
Aha! This sounds really promising.
(Edit: removing the part about the resource URLs, there was a bug in my lookup script :))
Would the artifacts be the only way to get the resource URLs? We already do some pruning of the LH response for size control, so it's feasible for us to only grab what we need.
(Edit: removing the part about the resource URLs, there was a bug in my lookup script :)) Would the artifacts be the only way to get the resource URLs? We already do some pruning of the LH response for size control, so it's feasible for us to only grab what we need.
ha, just saw your edit. So you do see them in there?
The only resource URL that would be missing should be the service worker (at least as far as I know the page would have to be doing something weird for it to show up in its own network requests).
Yes, I fixed the bug and now I'm seeing the manifest URLs. 😁
As expected, I'm not seeing any ServiceWorker
resource types. Oddly though, the manifest URL is missing from ~9% of reports that pass the installable manifest audit:
#standardSQL
CREATE TEMP FUNCTION getUrls(report STRING)
RETURNS STRUCT<manifest STRING, sw STRING> LANGUAGE js AS '''
var $ = JSON.parse(report);
return $.audits['network-requests'].details.items.reduce((urls, item) => {
if (item.resourceType == 'Manifest') {
urls.manifest = item.url;
} else if (item.resourceType == 'ServiceWorker') {
urls.sw = item.url;
}
return urls;
}, {});
''';
SELECT
COUNTIF(urls.manifest IS NULL) AS null_manifests,
COUNTIF(urls.sw IS NULL) AS null_sq,
COUNT(0) AS total,
COUNTIF(urls.manifest IS NULL) / COUNT(0) AS pct_null_manifests,
COUNTIF(urls.sw IS NULL) / COUNT(0) AS pct_null_sw
FROM (
SELECT
getUrls(report) AS urls
FROM
`httparchive.lighthouse.2019_07_01_mobile`
WHERE
JSON_EXTRACT_SCALAR(report, '$.audits.installable-manifest.score') = '1')
null_manifests | null_sq | total | pct_null_manifests | pct_null_sw |
---|---|---|---|---|
7253 | 79345 | 79345 | 9.14% | 100.00% |
Any ideas why that might be?
resourceType
is populated by the type
field of the Network.requestWillBeSent
protocol event. Someone may want to dig into the implementation to see for sure, but it's possible that it's misclassified in type but still treated as a proper manifest.
Maybe try one of the problem URLs in Chrome and check out the network panel in DevTools? The resourceType
in the audit should match what's in the "type" column.
that's actually another one we could manually annotate (this time on the installable-manifest
audit). The Page.getAppManifest
return value includes the URL so we wouldn't have to go guessing from resourceType which file was actually used as the manifest.
So maybe we should fix this issue as originally requested after all :)
@rviscomi just confirming this is still a valuable feature for you / HTTPArchive ?
Yes, this is still useful for us!
OK, this was a terribly confusing conversation to leave for posterity if the goal was to communicate what actually needs to be done :) but I think I've got a handle on it below. Probably best to assess/fix these two separately as they're not really related.
When a page has a ServiceWorker, the browser requests the SW script, not the page, so it isn't in our network-requests
audit list of requests.
Possible fix: the ServiceWorker
artifact has the SW's URL, so the service-worker
audit could include the SW's URL in debugData
details. It looks like getControllingScopeUrl()
would need to be modified to return the LH.Crdp.ServiceWorker.ServiceWorkerVersion
with the most specific scopeUrl (instead of just the scopeUrl) so that its scriptURL
could be included as details in the final score: 1
audit result
When a page has a web app manifest, it should be loaded by the page (caused by lighthouse itself, I believe?) and so it will show up in report.audits['network-requests'].details.items
and have the property resourceType: 'Manifest'
. However, @rviscomi found that 9% of the time pages have a (supposedly) valid manifest but there is no network request with resourceType: 'Manifest'
. I assume this is because it ends up based on mime type and ignores the fact that we just explicitly requested a web app manifest :)
Possible fix: Assuming that 9% is just misclassified resource types, should we find the manifest URL in the network-requests
items and manually set the resourceType
to 'Manifest'
? Or add the URL as details on some audit (installable-manifest
?)?
One issue is that GatherRunner
doesn't keep the manifest URL around from driver.getAppManifest()
, so we may want to add the URL to the manifest artifact? Otherwise we could go with the entry in LinkElements
@connorjclark mentioned in https://github.com/GoogleChrome/lighthouse/issues/9683#issuecomment-531922937, we'll just have to deal with slightly more uncertainty than asking the protocol directly (like, what if the page declares a manifest twice, which one was the actual manifest used by Chrome?).
caused by lighthouse itself, I believe?
Our fetch of the manifest does not happen while we're recording devtools logs though, so I would assume this happens somehow as a result of us loading the page and there being a link[rel=manifest]
fixed by #11329 and #11330
When LH discovers a page with a service worker or a manifest file, it would be hugely helpful for downstream services like HTTP Archive if those file URLs were included in the JSON results somehow. Otherwise we end up doing a lot of questionable regex parsing of the HTML.
plz i can has urls?