Closed Yash-Vekaria closed 4 months ago
@Yash-Vekaria please remove my commits from PR #117. And after another push WPT tests should run, as PR was not initially pointing to main branch.
@max-ostapenko The Google's sellers.json returns 404 in the automated test pipeline workflow. However, when I tested it on WPT Test and on local, it worked returning all Google sellers. Can we rerun the WPT pipeline here again?
@max-ostapenko Commit 522dc2b returns null
on WPT Test. Any feedback on what is the issue? Or how to debug using WPT?
@Yash-Vekaria and how many total account/lines are we talking about if we would enable saving domains lists?
We have domain_count
for all kinds of accounts, so could sum it and have and idea of additional data volume per crawl.
@Yash-Vekaria in ads.txt parser could you please move
redirected: response.redirected,
to the result object initiation.
We could add this attribute even if there are any issues with the content.
@Yash-Vekaria and how many total account/lines are we talking about if we would enable saving domains lists? We have
domain_count
for all kinds of accounts, so could sum it and have and idea of additional data volume per crawl.
Google has more than 200K domains. But, otherwise a typical ads.txt/sellers.json would be mostly less than 1K. I added domain extraction so we could plot the trends of top ad-exchanges embedded on a page or plot a graph of sellers and publishers (if we want), etc.
Also, Google or any ad network might be present on multiple websites we crawl, but shouldn't we crawl sellers.json once rather than having hundreds of thousands of copy of the same data? This mainly applies for domains of each seller that we are craling.
Yeah, seems it's gonna be a size increment in Mb.
Our crawl is not focused on server-side adtech but web, so we may need to take required steps (deduplicate and work around missing datapoints) during analysis and clarify it in the article.
But from the table data it looks really amazing, in top 100K pages:
Yeah, seems it's gonna be a size increment in Mb.
Our crawl is not focused on server-side adtech but web, so we may need to take required steps (deduplicate and work around missing datapoints) during analysis and clarify it in the article.
But from the table data it looks really amazing, in top 100K pages:
- 1/3 have ads.txt
- 1/10 sellers.json So more than enough to visualize relationship trends.
I fixed all changes in the last commit, but still I am unable to make it work for Google. Feel free to suggest any changes you think will solve that. Google's "sellers" output us giving
Yeah, seems it's gonna be a size increment in Mb.
Our crawl is not focused on server-side adtech but web, so we may need to take required steps (deduplicate and work around missing datapoints) during analysis and clarify it in the article.
But from the table data it looks really amazing, in top 100K pages:
- 1/3 have ads.txt
- 1/10 sellers.json So more than enough to visualize relationship trends.
I made all bug fixes as per the discission in the latest commit. But, Google's sellers.json is still not working. It gives: "sellers": {"status":-1,"present":false,"error":"Failed to fetch"}. When i call fetchAndParse("google.com/sellers.json"), it does fetch the full Google's JSON within 10sec timeout in dev environment. But, with fetchAndParse("/sellers.json") in the WPT Test env it fails with "Failed to fetch" error.
I had an impression we agreed not to parse Google's JSON in crawl as due to the big size we'll have issues processing and storing it, no?
I had an impression we agreed not to parse Google's JSON in crawl as due to the big size we'll have issues processing and storing it, no?
My bad, I misunderstood. I thought since its working with 5-10sec timeout in dev console, we can include it as Google has presence on most of the platforms. But, sure, I will remove Google and push the code.
@max-ostapenko The ads.js is working perfectly now. The only issue now is with the privacy-sandbox.js file returning null
response.
@Yash-Vekaria Even though our PR tests don't run with required flags yet, here is a website to test it on: https://www.operafootball.com/
In addition to custom metrics we need to add custom flags (as suggested here) in Chromium tab on https://webpagetest.httparchive.org.
And currently I see empty metrics, while there are actually requests with headers. I think due to case mismatch in the header.
I added the features to the HTTP archive fork of the agent so they should be available through the actions now.
@max-ostapenko I have added Protected Audience API and Attribution Reporting API related details. Could you please review it. Also, not because of these additions, but due to some independent reason the outputs are generated blank. Could you take a look at that, so that we can close/merge this PR?
@Yash-Vekaria Ads custom metrics seem ready to merge. Maybe you could split them into a separate PR to merge asap? Next thing let's try to get Topics API working again. And would you still have time to complete the other two APIs this week?
@tunetheweb @pmeenan @max-ostapenko I have closed this PR #118 and isolated ads.js
in PR #128 and privacy-sandbox
.js in PR #129.
PR #128 is ready to be merged.
PR #129 needs some testing and bug fixes. Currently I have pushed Topics API, Protected Audience API, and Attribution Reporting API to this PR. @yohhaan is working on testing and adding some other Privacy Sandbox APIs to this PR. We expect to finish the changes before June 10th.
Fixes/improves ads.js
Following changes are made:
Changes 1-4 are handled in
Commit #fa17fc5d2c14ccedb5bca637dde45ef468ce52a6
and Change 5 is handled inCommit #13c2040a7da673141177281a5fe1d4787fd52c3b
.Testing was done on WPT Test for the following website and it worked fine.
Test websites: