HTTPArchive / almanac.httparchive.org

HTTP Archive's annual "State of the Web" report made by the web community
https://almanac.httparchive.org
Apache License 2.0
611 stars 170 forks source link

Privacy 2020 #913

Closed foxdavidj closed 3 years ago

foxdavidj commented 4 years ago

Part II Chapter 10: Privacy

Content team

Authors Reviewers Analysts Draft Queries Results
@ydimova @ldevernay @ydimova @max-ostapenko Doc *.sql Sheet

Content team lead: @ydimova

Welcome chapter contributors! You'll be using this issue throughout the chapter lifecycle to coordinate on the content planning, analysis, and writing stages.

The content team is made up of the following contributors:

New contributors: If you're interested in joining the content team for this chapter, just leave a comment below and the content team lead will loop you in.

Note: To ensure that you get notifications when tagged, you must be "watching" this repository.

Milestones

0. Form the content team

1. Plan content

2. Gather data

3. Validate results

4. Draft content

5. Publication

tunetheweb commented 4 years ago

Great (and very relevant!) topic. Should it be merged with Cookies though as often quite related? Or do we think there's enough for them both to be their own chapter?

foxdavidj commented 4 years ago

@bazzadp Good q. I think this is something that will be made clear as we brainstorm what metrics should belong in this chapter. If there ends up being a lot of overlap and not enough unique to this chapter alone... then we can talk about merging it with another chapter like Cookies

That's what we're doing with some other chapters like JAMstack

ldevernay commented 4 years ago

I think I could contribute as a reviewer on this topic.

zcorpan commented 4 years ago

I nominate @johnwilander (see https://github.com/HTTPArchive/almanac.httparchive.org/issues/876 )

ydimova commented 4 years ago

I would like to volunteer as an analyst/author.

rviscomi commented 4 years ago

Thanks @ydimova! I'll put you down as an analyst. Would you also mind sharing some of your qualifications/experience with web privacy? I'm not able to find much info from a cursory search, as your full name is not associated with your GitHub profile. Just want to check before assigning the chapter to you :)

@johnwilander are you interested in coauthoring this chapter?

ydimova commented 4 years ago

@rviscomi Of course! I'm a computer scientist and PhD student in web privacy and security (no publications yet). I have been using the httparchive dataset for some of my research so I think I could contribute as an analyst and coauthor.

rviscomi commented 4 years ago

Thanks @ydimova, you sound like a great fit for this chapter! I've added you as an author. Can I also put you down as the content team lead? You'd be the point person for keeping the chapter on schedule. You're also free to add people as coauthors/reviewers as needed.

A few resources to get you started:

I've also added @ldevernay as a reviewer.

@johnwilander we'd still love to have you contribute as a coauthor/reviewer. Let us know!

ydimova commented 4 years ago

@rviscomi Sure, thanks!

foxdavidj commented 4 years ago

Hey @ydimova, just wanted to check in and see if there's anything you need from me to keep things moving forward.

We're tying to have the outline and metrics settled on by the end of the week so we have time to configure the Web Crawler to track everything you need :)

Also, can you remind your team to properly add and credit themselves in your chapter's Google Doc?

tunetheweb commented 4 years ago

Unfortunately we've had to close the Cookie chapter, but think it's heavily related to Privacy anyway so that's another interesting angle to cover in this chapter if you want!

max-ostapenko commented 4 years ago

Migrating to this one from Cookie chapter as an analyst ;)

foxdavidj commented 4 years ago

@ydimova How is the outline coming along? We want to have that wrapped up by the end of the week so we have time to set up our Web Crawler :)

foxdavidj commented 4 years ago

@ydimova Also don't forget to join the #web-almanac slack if you haven't already so @paulcalvano can invite you to the Analysts channel and help set you up.

rockeynebhwani commented 4 years ago

@ydimova - As cookies chapter got closed, this may be of interest to this group - https://github.com/AliasIO/wappalyzer/issues/3219

It will be good to report on % of sites using cookie consent management solutions (obviously within EU, we will see higher %) and out of sites using cookie consent management solutions, how many are using explicit Vs implicit consent? Because of the way, HTTPArchive works, any sites with explicit consent will have less number of third parties reported and that can also impact Third Party chapter of Web Almanac. I am not sure if you are thinking in that direction OR any analysis has been done before.

@rviscomi / @simonhearne / @patrickhulce

rviscomi commented 4 years ago

The tests are all run from the US so these may be undercounted, but it's an interesting angle to consider. If that issue is able to be resolved and included in a Wappalyzer release before the HTTP Archive crawl begins on August 1, then we'd be able to make use of it. Otherwise I think it will only be a minor inconvenience since IIUC it's just a categorization change of technologies that we're already detecting. @rockeynebhwani are you able to work on a PR? It seems AliasIO is supportive of the change.

ydimova commented 4 years ago

@rockeynebhwani I think it would indeed be interesting to measure the percentage of websites using popular and less popular cookie consent managament platforms and IAB Europe's TCF (if feasible).

@max-ostapenko @ldevernay Could you join the outline document :) https://docs.google.com/document/d/1hIllsWd_IqfYuGT_qUFA2ruoQaIvcbuYpNHJLB4AqkU Feel free to change/add anything

rviscomi commented 4 years ago

@ydimova I've sent you an invite to join the 2020 Authors team, which we'll use to communicate to authors about upcoming milestones. Could you visit https://github.com/HTTPArchive to accept the invitation? I want to make sure you're included in our messages :)

rockeynebhwani commented 4 years ago

@ydimova @bazzadp @rviscomi - Created a PR for Wappalyzer - https://github.com/AliasIO/wappalyzer/pull/3227. This team can add more technology vendors to the list on top of this.

foxdavidj commented 4 years ago

@ydimova @max-ostapenko Noticed there are a few metrics you might need custom metrics written for (e.g., finding policy links on a webpage). Can you make a list of what custom metrics you need by EOD tomorrow?

rockeynebhwani commented 4 years ago

@rockeynebhwani I think it would indeed be interesting to measure the percentage of websites using popular and less popular cookie consent managament platforms and IAB Europe's TCF (if feasible).

@max-ostapenko @ldevernay Could you join the outline document :) https://docs.google.com/document/d/1hIllsWd_IqfYuGT_qUFA2ruoQaIvcbuYpNHJLB4AqkU Feel free to change/add anything

@ydimova - I have done the PR and now Wappalyzer has a new category called 'Cookie Compliance' and for now I managed to add 17 vendors to this category. But while working on this, I realized that there are way too many vendors in this space and I don't have time to add all. I am not familier with IAB Europe's TCF solution but if you can tell me a pattern using which we can detect and a sample site, I think we still have time to add to my PR.

ydimova commented 4 years ago

@rockeynebhwani Great, thank you! Maybe we could just stick to the biggest vendors? The presence of the TCF framework can easily be evaluated by detecting the presence of a cmp() function in the window element. For instance 'typeof window.cmp()!== "undefined"' would work.

rockeynebhwani commented 4 years ago

Thanks @ydimova . Can you please give me an example site to test TCF framework?

max-ostapenko commented 4 years ago

@rockeynebhwani FYI here are official framework docs from IAB: https://iabeurope.eu/tcf-2-0/ The test website will help indeed. I was looking into vendor domain lists (e.g. https://vendorlist.consensu.org/vendorinfo.json), but no valuable data as of now.

ydimova commented 4 years ago

@rockeynebhwani @max-ostapenko I found it on https://www.letudiant.fr/ by calling "window.__cmp" (without the brackets). https://www.senscritique.com/ is another one

foxdavidj commented 4 years ago

@ydimova @max-ostapenko for the two milestones overdue on July 27 could you check the boxes if:

Keeping the milestone checklist up to date helps us to see at a glance how all of the chapters are progressing. Thanks for helping us to stay on schedule!

foxdavidj commented 4 years ago

I've updated the chapter metadata at the top of this issue to link to the public spreadsheet that will be used for this chapter's query results. The sheet serves 3 purposes:

  1. Enable authors/reviewers to analyze the results for each metric without running the queries themselves
  2. Generate data visualizations to be embedded in the chapter
  3. Serve as a public audit trail of this chapter's data collection/analysis, linked from the chapter footer
foxdavidj commented 4 years ago

@ydimova in case you missed it, we've adjusted the milestones to push the launch date back from November 9 to December 9. This gives all chapters exactly 7 weeks from now to wrap up the analysis, write a draft, get it reviewed, and submit it for publication. So the next milestone will be to complete the first draft by November 12.

However if you're still on schedule to be done by the original November 9 launch date we want you to know that this change doesn't mean your hard work was wasted, and that you'll get the privilege of being part of our "Early Access" launch.

Please see the link above for more info and reach out to @rviscomi or me if you have any questions or concerns about the timeline. We hope this change gives you a bit more breathing room to finish the chapter comfortably and we're excited to see it go live!

max-ostapenko commented 3 years ago

@ydimova FYI There is a cookie parameters data in Security chapter, in case you wanted to share some insights.