HTTPArchive / almanac.httparchive.org

HTTP Archive's annual "State of the Web" report made by the web community
https://almanac.httparchive.org
Apache License 2.0
611 stars 170 forks source link

Security 2021 #2150

Closed rviscomi closed 2 years ago

rviscomi commented 3 years ago

Part II Chapter 12: Security

Security illustration

If you're interested in contributing to the Security chapter of the 2021 Web Almanac, please reply to this issue and indicate which role or roles best fit your interest and availability: author, reviewer, analyst, and/or editor.

Content team

Lead Authors Reviewers Analysts Editors Coordinator
@SaptakS @SaptakS @tomvangoethem @nrllh @cqueern @edmondwwchan @awareseven @GJFR @tunetheweb @obto
Expand for more information about each role - The **[content team lead](https://github.com/HTTPArchive/almanac.httparchive.org/wiki/Content-Team-Leads'-Guide)** is the chapter owner and responsible for setting the scope of the chapter and managing contributors' day-to-day progress. - **[Authors](https://github.com/HTTPArchive/almanac.httparchive.org/wiki/Authors'-Guide)** are subject matter experts and lead the content direction for each chapter. Chapters typically have one or two authors. Authors are responsible for planning the outline of the chapter, analyzing stats and trends, and writing the annual report. - **[Reviewers](https://github.com/HTTPArchive/almanac.httparchive.org/wiki/Reviewers'-Guide)** are also subject matter experts and assist authors with technical reviews during the planning, analyzing, and writing phases. - **[Analysts](https://github.com/HTTPArchive/almanac.httparchive.org/wiki/Analysts'-Guide)** are responsible for researching the stats and trends used throughout the Almanac. Analysts work closely with authors and reviewers during the planning phase to give direction on the types of stats that are possible from the dataset, and during the analyzing/writing phases to ensure that the stats are used correctly. - **[Editors](https://github.com/HTTPArchive/almanac.httparchive.org/wiki/Editors'-Guide)** are technical writers who have a penchant for both technical and non-technical content correctness. Editors have a mastery of the English language and work closely with authors to help wordsmith content and ensure that everything fits together as a cohesive unit. - The **[section coordinator](https://github.com/HTTPArchive/almanac.httparchive.org/wiki/Section-Leads'-Guide)** is the overall owner for all chapters within a section like "User Experience" or "Page Content" and helps to keep each chapter on schedule. _Note: The time commitment for each role varies by the chapter's scope and complexity as well as the number of contributors._ For an overview of how the roles work together at each phase of the project, see the [Chapter Lifecycle](https://github.com/HTTPArchive/almanac.httparchive.org/wiki/Chapter-Lifecycle) doc.

Milestone checklist

0. Form the content team

1. Plan content

2. Gather data

3. Validate results

4. Draft content

5. Publication

Chapter resources

Refer to these 2021 Security resources throughout the content creation process:

📄 Google Docs for outlining and drafting content 🔍 SQL files for committing the queries used during analysis 📊 Google Sheets for saving the results of queries 📝 Markdown file for publishing content and managing public metadata

tomvangoethem commented 3 years ago

I'd like to join as an author!

SaptakS commented 3 years ago

I would love to help as a co-author or reviewer

cqueern commented 3 years ago

I'd like to support as a Peer Reviewer.

nrllh commented 3 years ago

I'd like to contribute as an author.

rviscomi commented 3 years ago

Thanks @tomvangoethem @SaptakS @cqueern @nrllh! It's great to see so much interest. I'm going to tentatively put you all down as peer reviewers for now until we're ready to start selecting authors. FYI since @tomvangoethem and @nrllh were coauthors of the 2020 Security chapter, we're going to lean towards selecting new people. More context in #2165.

edmondwwchan commented 3 years ago

Hi team, I am interested and would like to support this chapter as a peer reviewer.

rviscomi commented 3 years ago

Welcome back @edmondwwchan!

@tomvangoethem @SaptakS @cqueern @nrllh @edmondwwchan I think there's critical mass for all of the interested contributors of this chapter to start brainstorming content in the doc. It'd be great to get an early start on sketching the outline and thinking about metrics. Here's the 2020 doc for reference if you wanted to bootstrap this chapter with 2020 topics as a starting point. Add your notes to the 2021 doc if you can think of anything especially interesting that is new or outdated since the last chapter.

rviscomi commented 3 years ago

@SaptakS thanks for your interest in authoring this chapter! As the content team lead, you'll be responsible for the scope and direction of the chapter and keeping it on schedule. We automatically monitor the staffing and progress of each chapter based on the state of the initial comment so please keep that updated as you add new contributors and meet each milestone.

Depending on the length/scope of the chapter, you may want to add additional coauthors to share the load. @tomvangoethem and @nrllh are both interested to coauthor the chapter again this year and would be great additions. It's up to you!

We've created a Google Doc for this chapter, which you're encouraged to use to collaborate with the content team on the initial outline, metrics, and ultimately the final draft.

Next steps for this chapter are:

@obto will be the section coordinator for this chapter, so they'll be periodically checking in with you directly to make sure the chapter is staying on schedule. Reach out to them here in this issue if you have any questions about the process.

More information about the content team lead and author roles and responsibilities are available for reference in the wiki if needed.

To anyone else interested in contributing to this chapter, please comment below to join the team!

SaptakS commented 3 years ago

I would more than love to have @tomvangoethem and @nrllh as co-authors and get their valuable ideas from their experiences. Also, @tomvangoethem, you did analysis as well last year, would you be interested to do so this year as well?

rviscomi commented 3 years ago

📟 paging 2019/2020 contributors: @arturjanc @ScottHelme @paulcalvano @tunetheweb @ghedo @ndrnmnn @dotjs @jrharalson @AAgar

Would any of you be interested to contribute to the 2021 chapter? This chapter could use your help with reviewing and/or analyzing. It'd be great to have your support!

rviscomi commented 3 years ago

@awareseven were you interested in reviewing this chapter?

ScottHelme commented 3 years ago

I'd be happy to review 👍

SaptakS commented 3 years ago

@tomvangoethem @cqueern @nrllh @edmondwwchan @ScottHelme I have added the outline from last year in the docs, along with 2 more suggestions I think might make sense to add. More ideas are welcome that might make sense!!

awareseven commented 3 years ago

I can review the chapter @rviscomi and I am also happy to draft a few paragraphs as an author if you like

rviscomi commented 3 years ago

@awareseven ok great! I'll defer to the content team lead @SaptakS to loop you in.

SaptakS commented 3 years ago

@awareseven sure! You can take a look at the 2021 docs. I have taken the basic outline from last year and added few other things that I feel might be interesting to see. We are currently brainstorming ideas for the chapter so suggestions are welcome!

foxdavidj commented 3 years ago

Hey @SaptakS excited to work with you and the rest of the group this year on the Almanac. I'm your go to guy if you've got any questions or need help so don't hesitate to reach out to me on github, the Slack (@obto) or email (david@davidjfox.com).

Few first steps:

  1. @awareseven @ScottHelme should add themselves to the Google Doc in the format: Name (email@example.com).
  2. I've added links within the doc to the previous years Google doc in case you'd like to mine it for ideas.
  3. Would love to set up a 30 minute Zoom call in the next couple weeks to kick-start the chapter planning and brainstorming process, and put some faces to the names of the people we'll all be working with this year. I'll reach out again later this week to find a time that works.

Excited to work with you all this year.

foxdavidj commented 3 years ago

How does Monday (May 24) at 12p ET / 9a PT / 5p BST (timezones here) work for the 30m chat?

@SaptakS @tomvangoethem @nrllh @cqueern @edmondwwchan @awareseven @ScottHelme

edmondwwchan commented 3 years ago

@obto Appreciate for setting up the call. The meeting will be a bit late in my local time (May 25 12am GMT+8). Anyway, I will try my best to attend.

foxdavidj commented 3 years ago

@edmondwwchan So sorry about that. We'll try to find a more reasonable time moving forward.

Just sent the invite. If you didn't get it, please send me an email to david@davidjfox.com and I'll add you.

SaptakS commented 3 years ago

Got it! Thanks!

awareseven commented 3 years ago

@obto this works fine with me. My mail is in the docs so you can add me to the invite. Thanks!

GJFR commented 3 years ago

Hi! I'd like to join as an analyst!

SaptakS commented 3 years ago

@GJFR awesome!!! Adding you to the analyst role. Can you add your email id to the planning docs?

foxdavidj commented 3 years ago

@SaptakS Can you tick the first milestone checkbox above 0. Form the content team? Keeping these milestones up to date helps to give me an overview on how the Almanac as a whole is coming together

foxdavidj commented 3 years ago

@SaptakS @tomvangoethem @nrllh @cqueern @edmondwwchan @awareseven @ScottHelme @GJFR

Hey everyone, wanted to give a quick reminder that we need to have the chapter outline complete by June 15 so we have enough time to update our crawler with any additional metrics you need this year. Seeing some good ideas in the doc so far which is great.

Also the team has a channel on slack (#web-almanac-security), so feel free to join everyone there as well: https://join.slack.com/t/httparchive/shared_invite/zt-45sgwmnb-eDEatOhqssqNAKxxOSLAaA

If you have any other questions don't hesitate to reach out :)

foxdavidj commented 3 years ago

@SaptakS reminder that we've got just over 1 week (June 15th) until the chapter outline is due. Please work together with your analyst (@GJFR) to make sure all metrics are feasible as well.

If there are any new custom metrics you require this year, have them decided by EOD June 23rd so your analyst has time to add them to our crawler.

tunetheweb commented 3 years ago

FYI Lighthouse v8 has a new Content-Security-Policy audit, which we will have access to. Worth considering for this year: https://github.com/GoogleChrome/lighthouse/releases/tag/v8.0.0

cqueern commented 3 years ago

FYI Lighthouse v8 has a new Content-Security-Policy audit, which we will have access to. Worth considering for this year: https://github.com/GoogleChrome/lighthouse/releases/tag/v8.0.0

This is great, thanks @tunetheweb .

@ScottHelme may have some tips on sound ways of analyzing the adoption of CSP capabilities at web-scale.

foxdavidj commented 3 years ago

@SaptakS @GJFR

Hey everyone, wanted to give you a heads-up and reminder that the July website crawl has completed and chapters now need to:

  1. Analysts: Please write, test and publish the results of all the queries in the draft PR you should have created last month. We've got 3 chapters (PWA, Mobile Web, Accessibility) filled with every type of query you can imagine that you can refer to if you've ever got a question for how to grab the data you need.

  2. Chapter leads: Take a look at your Analysts draft PR where they have listed all the queries/data they'll be analyzing. You'll want to make sure all the ideas you discussed are listed and that nothing was lost in communication.

  3. Analysts: Once your queries are completed and data has been put into the spreadsheets (along with comments), set up a time to run through the data with the Chapter lead so they know exactly how to interpret the data

That's it! Really looking forward to seeing the chapter start to take form. And if you've ever got any questions just ping me

PWA: Queries, Results (has all their visualizations done as well) Mobile Web: Queries, Results A11Y: Queries, Results

GJFR commented 3 years ago

Hi @obto. There have been some issues regarding the well-known custom metric. I have fixed the issue in this PR, but still, the current crawl data will represent an undercount due to JS errors.

Would it be possible to push the fix for the September crawl such that we could still use this metric for the chapter?

foxdavidj commented 3 years ago

Hi @obto. There have been some issues regarding the well-known custom metric. I have fixed the issue in this PR, but still, the current crawl data will represent an undercount due to JS errors.

Would it be possible to push the fix for the September crawl such that we could still use this metric for the chapter?

How major is that metric to the chapter? We'd really prefer having the entire almanac using the July data.

If it's a more minor metric it'd be best to shelve it and save it for next year

GJFR commented 3 years ago

Hi @obto. There have been some issues regarding the well-known custom metric. I have fixed the issue in this PR, but still, the current crawl data will represent an undercount due to JS errors. Would it be possible to push the fix for the September crawl such that we could still use this metric for the chapter?

How major is that metric to the chapter? We'd really prefer having the entire almanac using the July data.

If it's a more minor metric it'd be best to shelve it and save it for next year

I'd say it's not that major, but I think @SaptakS has a better idea about that.

Because the presence of a page's /robots.txt endpoint was already being captured by another metric, I could check the amount of positives that were missed in the July crawl due to the errors:

Hosts with /robots.txt: Actual Measured Error
desktop_10k 7244 / 9990 7110 / 9990 134 / 9990 (1.3%)
mobile_10k 7439 / 9994 6916 / 9994 523 / 9994 (5.2%)

It seems the amount of /robots.txt endpoints that were missed in the July crawl is rather limited (at least for the sample data), also considering that not all errors can be prevented. Can we afford this error rate or should we disregard this data?

SaptakS commented 3 years ago

I don't think it's necessarily a major metric. I think we were just trying to get some new metrics this time from security.txt and robots.txt, but not critical for the chapter. @tomvangoethem @nrllh what do you think?

It seems the amount of /robots.txt endpoints that were missed in the July crawl is rather limited (at least for the sample data), also considering that not all errors can be prevented. Can we afford this error rate or should we disregard this data?

I am also curious about this. Do we keep this if the error is not much? @obto thoughts?

tomvangoethem commented 3 years ago

I think it's best to keep it for next year; we already have plenty of content & having to explain that there is an (additional) error rate in these measurements might complicate things for the reader.

foxdavidj commented 3 years ago

@SaptakS We'd be ok with you including the metric if your team does the following

  1. Mention the possible error when presenting the data from this metric
  2. Re-run that one metric with the September data to see if there is a large difference in the results
  3. If there is a large difference, we'll have to exclude it from the chapter and wait till next year

How does that sound?

SaptakS commented 3 years ago

I think rerunning on September data to check how large of a difference it is makes sense to me, if it's not too much of an effort @GJFR . I agree it makes sense otherwise to exclude it from the chapter this year.

GJFR commented 3 years ago

@SaptakS Sure, no problem!

rviscomi commented 2 years ago

@SaptakS @tomvangoethem @nrllh @cqueern @edmondwwchan @awareseven @GJFR

🎉 This chapter is fully written, reviewed, edited, and ready to be launched on Wednesday! Thank you to all of the contributors who put in the time and effort to make this a great chapter.

When you get 5 minutes, I'd really appreciate if you could fill out our contributor survey to tell us (the project leads) about your experience. It's super helpful to hear what went well or what could be improved for next time. 🙏

Congratulations and thank you all again. I'm excited for this to launch soon!