Closed rviscomi closed 4 years ago
I'm happy to reduce it for the robotstxt one. If it takes 5 seconds to return a simple text file, I'd classify that as an error on its own.
Thanks @Tiggerito. Are you able to take all 3 files? Should be the same pattern in each.
Thanks @Tiggerito. Are you able to take all 3 files? Should be the same pattern in each.
I'd have to research how to do it. This looks like a neat solution that could be put in a shared place. I think I saw that we can include js files?
https://www.lowmess.com/blog/fetch-with-timeout/
I could test it with my metric first?
Here's a prototype of the JS I had in mind:
fetch = new Promise(resolve => setTimeout(resolve, 30000, 'fetch'));
timeout = new Promise(resolve => setTimeout(resolve, 5000, 'timeout'));
Promise.race([fetch, timeout]).then(value => console.log(value));
Shouldn't need to include external JS to do it.
I tested using this (from the article I referenced) in WebPageTest and it worked well:
const fetchWithTimeout = (uri, options = {}, time = 5000) => {
const controller = new AbortController()
const config = { ...options, signal: controller.signal }
setTimeout(() => {
controller.abort()
}, time)
return fetch(uri, config)
.then((response) => {
if (!response.ok) {
throw new Error(`${response.status}: ${response.statusText}`)
}
return response
})
.catch((error) => {
if (error.name === 'AbortError') {
throw new Error('Response timed out')
}
throw new Error(error.message)
})
}
If I set a small timeout it returns:
{"message":"Response timed out","error":{}}
Which we could easily alter. Do we have a standard thing to return when custom metrics fail?
One advantage of this pattern is that it cancels the request on timeout, so no risk of having forgotten requests continuing to be processed.
It's also easy to plug in. Add the code and change the fetch(url) to a fetchWithTimeout(url), and it works.
Well not to play favorites (I'm totally playing favorites 😁) but the Promise approach can also be implemented as a fetchWithTimeout
function and is much simpler:
function fetchWithTimeout(url) {
var network = fetch(url);
var timeout = new Promise(resolve => setTimeout(resolve, 5000, 'timeout'));
return Promise.race([network, timeout]).then(r => {
if (r == 'timeout') return Promise.reject(r);
return r;
});
}
Now I understand promises more 😀
I'll raise your simplification:
function fetchWithTimeout(url) {
var controller = new AbortController();
setTimeout(() => {controller.abort()}, 5000);
return fetch(url, {signal: controller.signal});
}
Hey @Tiggerito sorry for the delay, your function LGTM. Are you able to apply that to each fetch
instance? Hoping to get this in today before the October crawl starts.
Looks like today is an HTTP Archive day. Will get onto it.
Testing the code now.
third-parties.js contains a fetch but is auto generated code built by bin/library-detector.js using what looks like another repository. It looks like the fetch is used in relation to the serviceWorker. Not a trivial one to alter.
Only thing I can think of is to update the builder to include code that intercepts the fetch. Something like:
let originalFetch = fetch;
fetch = function(url, options) {
var controller = new AbortController();
setTimeout(() => {controller.abort()}, 5000);
options.signal = controller.signal;
return originalFetch(url, options);
}
These should be the only custom metrics with fetch: https://github.com/search?q=fetch+repo%3AHTTPArchive%2Flegacy.httparchive.org+path%3Acustom_metrics&type=Code&ref=advsearch&l=&l=
The code that generates the third parties script uses fetch, but it's not part of the custom metric code itself.
Cool. Working on the last one now. sass.
Synced the HA server with the changes in #193 so this should take effect in the October crawl starting tomorrow. Thank you again for hopping on this @Tiggerito 🙏
An asynchronous
fetch
in a custom metric could take ~30 seconds before timing out. Rather than wait for the promise to reject, race the fetch against a shorter timeout of ~5-10 seconds and resolve the promise at the sooner of the two async events.This would help ensure that the custom metrics don't interfere as much with the overall crawl rate, as 30 seconds 7 million URLs 2 runs per client (desktop, mobile) definitely adds up.
Here are the instances of
fetch
in the custom metrics: https://github.com/search?q=fetch+repo%3AHTTPArchive%2Flegacy.httparchive.org+path%3Acustom_metrics&type=Code&ref=advsearch&l=&l=