HTTPArchive / custom-metrics

Custom metrics to use with WebPageTest agents
Apache License 2.0
19 stars 22 forks source link

Adding Privacy Sandbox APIs #129

Closed Yash-Vekaria closed 3 months ago

Yash-Vekaria commented 3 months ago

Implemented capturing the following Privacy Sandbox APIs:

  1. Attribution Reporting API
  2. Federated Credentials Manager API
  3. Fenced Frames API
  4. FLoC API
  5. Protected Audience API
  6. Private Aggregation API
  7. Private State Tokens API
  8. Shared Storage API
  9. Storage Access API
  10. Topics API
  11. User Client Hints API

Test websites:

tunetheweb commented 3 months ago

@Yash-Vekaria @max-ostapenko what's the latest on this PR as need to merge it today if we want it included in the Web Almanac 2024.

Yash-Vekaria commented 3 months ago

@Yash-Vekaria @max-ostapenko what's the latest on this PR as need to merge it today if we want it included in the Web Almanac 2024.

@tunetheweb Topics API is ready to merge. Protected Audience API and Attribution APIs have been correctly implemented but I don't have a good example that simulates all the cases, so partial response was recorded on the current test website. I'll let @yohhaan comment if he plans to incorporate any other APIs. @max-ostapenko informed me that he shall be taking a look at the code later today to finalize these APIs and perhaps remove stale code to make the PR ready to merge.

max-ostapenko commented 3 months ago

@tunetheweb let's merge it. I'll make my PR on top of main then

yohhaan commented 3 months ago

Hi, I am modifying at the moment the Topics API implementation (some errors in the current one), I will push shortly and add more details about the rest.

yohhaan commented 3 months ago

Can we add the following test sites to this PR for unit tests (I am still checking if header detection is working as expected)?

tunetheweb commented 3 months ago

This sort of thing is very verbose and needlessly increases the size of custom metrics for all 15 million sites (most of which likely don't use shared storage):

        "sharedStorage": {
            "shared-storage": true,
            "shared-storage-select-url": true,
            "append": [],
            "clear": [],
            "delete": [],
            "set": [],
            "run": [],
            "selectURL": [],
            "addModule": []
        },

Can we exclude empty values in all the data in this custom metric?

I made a similar comment on https://github.com/HTTPArchive/custom-metrics/pull/125#pullrequestreview-2097390279 because if EVERY custom metric adds this sort of thing then the custom metric (already 8.32TB or $50 for a single query) will become too large for anyone to query it without taking out a mortgage.

yohhaan commented 3 months ago

Sounds good, I am on it

This sort of thing is very verbose and needlessly increases the size of custom metrics for all 15 million sites (most of which likely don't use shared storage):

        "sharedStorage": {
            "shared-storage": true,
            "shared-storage-select-url": true,
            "append": [],
            "clear": [],
            "delete": [],
            "set": [],
            "run": [],
            "selectURL": [],
            "addModule": []
        },

Can we exclude empty values in all the data in this custom metric?

I made a similar comment on #125 (review) because if EVERY custom metric adds this sort of thing then the custom metric (already 8.32TB or $50 for a single query) will become too large for anyone to query it without taking out a mortgage.

yohhaan commented 3 months ago

Per @tunetheweb 's comment: I changed the format of the result object to make it smaller if APIs are not called.

@Yash-Vekaria: can you see if you can port the ProtectedAudience and Attribution Reporting APIs to a similar format?

max-ostapenko commented 3 months ago

@yohhaan I couldn't find any way to verify cross-origin files existence. Fetch doesn't fail because of 404. Do you have any ideas?

Otherwise I would say we remove the attestation checks from custom metrics completely.

tunetheweb commented 3 months ago

Same comment, do we need these all to be set to empty values or can we skip them when nothing of interest in them to save storage space/query costs?

        "fencedFrames": [],
        "floc": [],
        "privateAggregation": [],
        "privateStateTokens": [],
        "protectedAudienceAPI": {
            "interestGroups": {
                "joinAdInterestGroup": [],
                "leaveAdInterestGroup": [],
                "updateAdInterestGroups": [],
                "clearOriginJoinedAdInterestGroups": []
            },
            "runAdAuction": [],
            "generateBid": [],
            "scoreAd": [],
            "reportWin": [],
            "reportResult": []
        },
        "sharedStorage": [],
        "storageAccess": [],
        "topicsAPI": [],
        "userAgentClientHints": []
yohhaan commented 3 months ago

I am done with the edits I wanted to do on this custom metric, the tests have started to return results as expected since I fixed the double escape issue for the regex.

If people have more cycles to commit, they could try to find websites to test on for each API and see that the calls are actually correctly detected. I have done a few tests + the ones from the automated GitHub actions and things look good to me for now. Also, note: some websites listed to test on require user interaction, thus the empty results sometimes.

Thanks to @Yash-Vekaria for all the help along the implementation of this custom metric and to @max-ostapenko as well!

Yash-Vekaria commented 3 months ago

Thanks @yohhaan and @max-ostapenko for bringing this to completion.

@tunetheweb All the changes are locked from my side as well. We are ready to merge this PR.

@max-ostapenko Extraction of attestation has been removed as it was working in WPT without Privacy Sandbox API capturing in WPT but not with it on live checks (some async/promise issue thats hard to debug due to different behavior). We have decided to analyse it in analysis phase and remove it in custom metrics altogether.

tunetheweb commented 3 months ago

I still think that is a lot of extra data to store per URL when in a lot of cases there won’t be any different. Is there no way to only report if the defaults are different?

I tried adding https://www.example.com to see what it would produce and reran the test but it didn’t pick them up. But I’m presuming it would produce the same as the blank ones?

Yash-Vekaria commented 3 months ago

@tunetheweb One way to optimize further is that we remove all permission-policy flags (as shown below) because now that I'm thinking more about it, they should always be true (except for "top-level-storage-access"). Using command line activation fo Privacy Sandbox APIs for the browser being used makes these set to true by default.

"permissionsPolicy": {
            "attribution-reporting": true,
            "browsing-topics": true,
            "identity-credentials-get": true,
            "interest-cohort": true,
            "join-ad-interest-group": true,
            "private-aggregation": true,
            "run-ad-auction": true,
            "shared-storage": true,
            "shared-storage-select-url": true,
            "storage-access": true,
            "top-level-storage-access": false
        },

Next, for other APIs, we can have a domain-level mapping where key is domain and the value is a list of APIs used by that domain on the current website, rather than having separate keys for each API and same domain repeating in each of them.

So, reduced output will look like the following for operafootball.com:

{
    "_privacy-sandbox": {
        "top-level-storage-access": false
        "securepubads.g.doubleclick.net": ["runAdAuction", "getHighEntropyValues"],
        "pagead2.googlesyndication.com": ["runAdAuction", "joinAdInterestGroup", "getHighEntropyValues"],
        "www.googletagmanager.com": ["joinAdInterestGroup", "getHighEntropyValues"],
        "googleads.g.doubleclick.net": ["browsingTopicsHeader|true",],
    }
}
tunetheweb commented 3 months ago

That looks a lot better if you can do that. And happy to include permission policy data if it;'s different from the default (i.e. explicitly blocked) but seems silly to log that same default of all true millions of times.

github-actions[bot] commented 3 months ago
Custom metrics for https://almanac.httparchive.org/en/2022/ WPT test run results: http://webpagetest.httparchive.org/results.php?test=240611_9X_2N
Custom metrics for https://www.operafootball.com/ WPT test run results: http://webpagetest.httparchive.org/results.php?test=240611_2W_2P Changed custom metrics values: ```json { "_privacy-sandbox": { "top-level-storage-access": false, "privacySandBoxAPIUsage": { "securepubads.g.doubleclick.net": [ "runAdAuction", "navigator.userAgentData.getHighEntropyValues", "fencedFrameJs", "sec-browsing-topics|false", "attribution-reporting-eligible" ], "pagead2.googlesyndication.com": [ "runAdAuction", "navigator.userAgentData.getHighEntropyValues", "joinAdInterestGroup" ], "googleads.g.doubleclick.net": [ "sec-browsing-topics|true", "sec-browsing-topics|false", "navigator.userAgentData.getHighEntropyValues" ], "www.googletagmanager.com": [ "joinAdInterestGroup", "navigator.userAgentData.getHighEntropyValues" ], "fundingchoicesmessages.google.com": [ "accept-ch|Sec-CH-UA-Arch, Sec-CH-UA-Bitness, Sec-CH-UA-Full-Version, Sec-CH-UA-Full-Version-List, Sec-CH-UA-Model, Sec-CH-UA-WoW64, Sec-CH-UA-Form-Factor, Sec-CH-UA-Platform, Sec-CH-UA-Platform-Version" ], "cdn.ampproject.org": [ "reportWin" ], "tpc.googlesyndication.com": [ "navigator.userAgentData.getHighEntropyValues" ], "www.gstatic.com": [ "navigator.userAgentData.getHighEntropyValues" ], "www.googleadservices.com": [ "attribution-reporting-eligible", "attribution-reporting-register-source|destination=https://homeserve.com|epsilon=undefined", "attribution-reporting-register-source|destination=https://fubo.tv|epsilon=undefined", "privateAggregation.enableDebugMode", "attribution-reporting-register-source|destination=https://t-mobile.com|epsilon=undefined" ], "pixel.tapad.com": [ "accept-ch|Sec-CH-UA\nSec-CH-UA-Arch\nSec-CH-UA-Bitness\nSec-CH-UA-Full-Version-List\nSec-CH-UA-Mobile\nSec-CH-UA-Model\nSec-CH-UA-Platform\nSec-CH-UA-Platform-Version\nSec-CH-UA-WoW64", "accept-ch|Sec-CH-UA\u0000Sec-CH-UA-Arch\u0000Sec-CH-UA-Bitness\u0000Sec-CH-UA-Full-Version-List\u0000Sec-CH-UA-Mobile\u0000Sec-CH-UA-Model\u0000Sec-CH-UA-Platform\u0000Sec-CH-UA-Platform-Version\u0000Sec-CH-UA-WoW64" ] } } } ```
Custom metrics for https://floc.glitch.me/ WPT test run results: http://webpagetest.httparchive.org/results.php?test=240611_KW_2Q Changed custom metrics values: ```json { "_privacy-sandbox": { "top-level-storage-access": false, "privacySandBoxAPIUsage": { "floc.glitch.me": [ "document.interestCohort" ] } } } ```
Custom metrics for https://pets-animals-pets-cats.glitch.me/ WPT test run results: http://webpagetest.httparchive.org/results.php?test=240611_G3_2R Changed custom metrics values: ```json { "_privacy-sandbox": { "top-level-storage-access": false, "privacySandBoxAPIUsage": { "pets-animals-pets-cats.glitch.me": [ "document.browsingTopics|false" ], "topics-demo.glitch.me": [ "document.browsingTopics|false" ] } } } ```
Custom metrics for https://tennis-tennis.glitch.me/ WPT test run results: http://webpagetest.httparchive.org/results.php?test=240611_V5_2S Changed custom metrics values: ```json { "_privacy-sandbox": { "top-level-storage-access": false, "privacySandBoxAPIUsage": { "tennis-tennis.glitch.me": [ "document.browsingTopics|true" ], "topics-server.glitch.me": [ "sec-browsing-topics|false" ] } } } ```
Custom metrics for https://protected-audience-demo.web.app/ WPT test run results: http://webpagetest.httparchive.org/results.php?test=240611_60_2T Changed custom metrics values: ```json { "_privacy-sandbox": { "top-level-storage-access": false, "privacySandBoxAPIUsage": [] } } ```
Custom metrics for https://protected-audience-demo-advertiser.web.app/ WPT test run results: http://webpagetest.httparchive.org/results.php?test=240611_DD_2V Changed custom metrics values: ```json { "_privacy-sandbox": { "top-level-storage-access": false, "privacySandBoxAPIUsage": { "protected-audience-demo-dsp.web.app": [ "joinAdInterestGroup" ] } } } ```
Custom metrics for https://protected-audience-demo-publisher.web.app/ WPT test run results: http://webpagetest.httparchive.org/results.php?test=240611_3E_2W Changed custom metrics values: ```json { "_privacy-sandbox": { "top-level-storage-access": false, "privacySandBoxAPIUsage": { "protected-audience-demo-ssp.web.app": [ "runAdAuction" ] } } } ```
Custom metrics for https://fedcm-rp-demo.glitch.me/ WPT test run results: http://webpagetest.httparchive.org/results.php?test=240611_AV_2X Changed custom metrics values: ```json { "_privacy-sandbox": { "top-level-storage-access": false, "privacySandBoxAPIUsage": { "fedcm-idp-demo.glitch.me": [ "navigator.credentials.get" ] } } } ```
Custom metrics for https://fedcm-idp-demo.glitch.me/ WPT test run results: http://webpagetest.httparchive.org/results.php?test=240611_8T_2Y Changed custom metrics values: ```json { "_privacy-sandbox": { "top-level-storage-access": false, "privacySandBoxAPIUsage": { "fedcm-idp-demo.glitch.me": [ "navigator.credentials.get", "IdentityProvider.getUserInfo" ] } } } ```
Custom metrics for https://private-state-token-redeemer.glitch.me/ WPT test run results: http://webpagetest.httparchive.org/results.php?test=240611_6V_2Z Changed custom metrics values: ```json { "_privacy-sandbox": { "top-level-storage-access": false, "privacySandBoxAPIUsage": [] } } ```
Custom metrics for https://private-state-token-issuer.glitch.me/ WPT test run results: http://webpagetest.httparchive.org/results.php?test=240611_PT_30 Changed custom metrics values: ```json { "_privacy-sandbox": { "top-level-storage-access": false, "privacySandBoxAPIUsage": [] } } ```
Custom metrics for https://user-agent-client-hints.glitch.me/ WPT test run results: http://webpagetest.httparchive.org/results.php?test=240611_X6_31 Changed custom metrics values: ```json { "_privacy-sandbox": { "top-level-storage-access": false, "privacySandBoxAPIUsage": [] } } ```
Custom metrics for https://shared-storage-demo.web.app/ WPT test run results: http://webpagetest.httparchive.org/results.php?test=240611_2G_32 Changed custom metrics values: ```json { "_privacy-sandbox": { "top-level-storage-access": false, "privacySandBoxAPIUsage": [] } } ```
Custom metrics for https://www.example.com/ WPT test run results: http://webpagetest.httparchive.org/results.php?test=240611_Q8_33 Changed custom metrics values: ```json { "_privacy-sandbox": { "top-level-storage-access": false, "privacySandBoxAPIUsage": [] } } ```
Yash-Vekaria commented 3 months ago

@tunetheweb I have refactored the code as shared in the previous comment. If it looks good, we can merge.

tunetheweb commented 3 months ago

Thanks for that. Merging!