HTTPArchive / legacy.httparchive.org

<<THIS REPOSITORY IS DEPRECATED>> The HTTP Archive provides information about website performance such as # of HTTP requests, use of gzip, and amount of JavaScript. This information is recorded over time revealing trends in how the Internet is performing. Built using Open Source software, the code and data are available to everyone allowing researchers large and small to work from a common base.
https://legacy.httparchive.org
Other
328 stars 84 forks source link

New event-names and pwa metrics did not use JSON.stringify #212

Closed tunetheweb closed 3 years ago

tunetheweb commented 3 years ago

@rviscomi (and FYI @demianrenzulli) the June 2021 desktop crawl has finished and trying to look at the PWA queries but having trouble extracting the data.

It looks to me like you didn't use JSON.stringify, like the other Custom Metrics use, and this is leading to difficulty in selecting the data.

For example:

#standardSQL
    SELECT
      payload,
      JSON_EXTRACT_SCALAR(payload, '$._pwa.serviceWorkers') as ServiceWorkers, -- Returns null
      JSON_EXTRACT_ARRAY(payload, '$._pwa.serviceWorkers') as ServiceWorkers, -- Returns non null but not sure where to go from here
      JSON_EXTRACT_SCALAR(payload, '$._privacy') -- existing Custom Metric using JSON.stringify which works fine.
    FROM
      `httparchive.pages.2021_06_01_desktop`
  WHERE
    url = 'https://www.naranja.com/'
    LIMIT 10

Was this an oversight that needs to be corrected before July? Or is there a better way to query this data? Do you need to use a JS function always?

rviscomi commented 3 years ago

What does the non-null response look like?

tunetheweb commented 3 years ago

I dunno how to select it - that's the problem. So just a greyed out box. But at least doesn't say Null

image

tunetheweb commented 3 years ago

FYI this is what the whole payload looks like:

image

So you can see the other custom metrics are basically strings, but the pwa one is a full JSON object (as are the _event-names and _initiators). You'd think that would make it EASIER to select but struggling to get it working.

rviscomi commented 3 years ago

The type of _pwa.serviceWorkers is a JSON object, so I think JSON_EXTRACT_SCALAR is expected to return null values. JSON_EXTRACT_ARRAY seems to be WAI because there are no integer indices.

Something like this works to extract a few sample objects:

SELECT
  JSON_EXTRACT(payload, "$._pwa.serviceWorkers") AS pwa
FROM
  `httparchive.pages.2021_06_01_desktop`
WHERE
  JSON_EXTRACT(payload, "$._pwa.serviceWorkers") != "[]"
LIMIT
  10

I think a complex custom metric like this would require a UDF to iterate through the response object. For example:

CREATE TEMP FUNCTION GET_SERVICE_WORKERS(pwa STRING) RETURNS ARRAY<STRING> LANGUAGE js AS '''
var $ = JSON.parse(pwa);
return Object.keys($.serviceWorkers);
''';

SELECT
  GET_SERVICE_WORKERS(JSON_EXTRACT(payload, '$._pwa')) AS ServiceWorkers
FROM
  `httparchive.pages.2021_06_01_desktop`
WHERE
  url = 'https://www.naranja.com/'

This returns https://www.naranja.com/sw.js.

tunetheweb commented 3 years ago

Cheers for the pointers. Will play with it some more. Agree will need a UDF for most queries but started on a simple one that was looking at another table, and wanted to restrict to PWAs (i.e has a SW and a Manifest) so thought UDF was overkill. You’re middle might work for that. Or maybe a simple UDF is the way to go.

So we think we should do JSON objects for future customs metrics rather than Strings that need to be parsed? It probably is easier but just need to think about it slightly differently.

rviscomi commented 3 years ago

I think because the object keys are the variables that makes a UDF necessary.

IIRC we used string format for custom metrics in the past because it made debugging easier on WPT. @pmeenan has since improved the way JSON-based custom metrics are rendered in the UI so that gave us a good reason to use more semantic types rather than nesting JSON in JSON.

That said, I do have some regrets about introducing inconsistencies between custom metrics. Ideally we could convert the stringified custom metrics back to their object-based format, but that would require updating old queries.

tunetheweb commented 3 years ago

Ok. As long as it was intentional that’s cool. Will figure it out with your pointers and say your Fugu one again. Will close this.