Closed bstandaert-wustl closed 4 months ago
@bstandaert-wustl please move it into privacy chapter custom metric under dist/privacy.js
file
Number of the websites that make sense to test for CCPA compliance is quite small compared to the dataset, particularly considering the crawlers run from Virginia, not California.
Could we define the eligible set of sites to optimize it, e.g. using locale, TLD, ..?
FWIW, The crawl runs from most of the US regions of Google Cloud so there's SOME California testing (~1/7 of the crawl) but it's not deterministic and hard to say how IP geolocation works for any of the regions.
@max-ostapenko From https://petsymposium.org/popets/2022/popets-2022-0030.pdf:
the CCPA applies to all websites doing business in California, regardless of the domicile of their business or the language of their website.
I don't think we have a reasonable way to filter on "doing business in California". Perhaps we could rank sites by monthly traffic and take only the top fraction (since the CCPA has a minimum user threshold).
Per the discussion in Slack, @umariqbal recommends testing this on all sites and framing it as “prevalence of enforcement mechanisms” rather than compliance: https://httparchive.slack.com/archives/C023K97SR8U/p1716560204137519
Are there any other changes you recommend making to this before the deadline?
@max-ostapenko @bstandaert-wustl what's the latest on this? We need to merge this today if we want to include it in the Web Almanac 2024.
@tunetheweb I think the remaining questions are minor - if we aren't able to resolve them before your deadline, can you go ahead and merge this?
Yeah, I see no blockers. 👍
Got merge conflicts now @max-ostapenko . Can you resolve?
@tunetheweb please merge
Test websites: