Closed GJFR closed 3 years ago
Is this ready for review?
@GJFR Could you please add, as ,emtioned in #211
// privacy
parseResponse('/.well-known/gpc.json', r => {
return r.text().then(text => {
let data = {
'gpc': null
};
let gpc_data = JSON.parse(text);
if (typeof gpc_data.gpc == 'boolean') {
data.gpc = gpc_data.gpc;
}
return data;
});
}),
@max-ostapenko Your code has been added 👍
@rviscomi Thank you for your comments! I've updated the code. I'm gonna do a quick check and tweak the robots.txt data collecting. Will mark as ready for review ASAP.
WPT test runs:
WPT tests look good to me.
I filtered on all keywords discussed in this thread.
A few thoughts:
User-agent
entries in itself, we could remove reported user-agents that do not have any matched Disallow
paths to reduce clutter.author
seems a popular false positive for auth
. I'm sure we will encounter similar false positives in the crawl data. In my opinion, we should make the filter more strict in the querying stage and not on the custom metric level for two reasons:
WPT test runs:
LGTM thanks everyone!
Progress on https://github.com/HTTPArchive/almanac.httparchive.org/issues/2150.
Also includes renaming and extending ecommerce custom metric for well-known URLs as per https://github.com/HTTPArchive/almanac.httparchive.org/issues/2211.