HTTPArchive / almanac.httparchive.org

HTTP Archive's annual "State of the Web" report made by the web community
https://almanac.httparchive.org
Apache License 2.0
606 stars 168 forks source link

PWA 2020 #909

Closed foxdavidj closed 3 years ago

foxdavidj commented 4 years ago

Part II Chapter 14: PWA

Content team

Authors Reviewers Analysts Draft Queries Results
@hemanth @thepassle @jadjoubran @pearlbea @gokulkrishh @jaisanth @logicalphase @bazzadp Doc *.sql Sheet

Content team lead: @hemanth

Welcome chapter contributors! You'll be using this issue throughout the chapter lifecycle to coordinate on the content planning, analysis, and writing stages.

The content team is made up of the following contributors:

New contributors: If you're interested in joining the content team for this chapter, just leave a comment below and the content team lead will loop you in.

Note: To ensure that you get notifications when tagged, you must be "watching" this repository.

Milestones

0. Form the content team

1. Plan content

2. Gather data

3. Validate results

4. Draft content

5. Publication

logicalphase commented 4 years ago

I'm still really hoping to author or co-author this section. The last three years I've been actively involved in introducing PWAs to the wider web communities. So along with last year's references I've developed a lot of reference material on adoption.

jadjoubran commented 4 years ago

Also interested in peer reviewing this chapter 👍

thepassle commented 4 years ago

I'd also be interested in doing review work here 🙂

foxdavidj commented 4 years ago

@hemanth thank you for agreeing to be the lead author for the PWA chapter! As the lead, you'll be responsible for driving the content planning and writing phases in collaboration with your content team, which will consist of yourself as lead, any coauthors you choose as needed, peer reviewers, and data analysts.

The immediate next steps for this chapter are:

  1. Establish the rest of your content team. Several other people were interested or nominated (see below), so that's a great place to start. The larger the scope of the chapter, the more people you'll want to have on board.
  2. Start sketching out ideas in your draft doc.
  3. Catch up on last year's chapter and the project methodology to get a sense for what's possible.

There's a ton of info in the top comment, so check that out and feel free to ping myself or @rviscomi with any questions!

@thepassle @jadjoubran @logicalphase @pearlbea I've put you down as reviewers for now, and will leave it to @hemanth to reassign at their discretion

@abraham @tpiros we'd still love to have you contribute as a peer reviewer or coauthor as needed. Let us know if you're still interested!

hemanth commented 4 years ago

Awesome @rviscomi! Looking forward to work will all the co-authors and reviewers, PWA FTW!

logicalphase commented 4 years ago

@hemanth I'd be happy to co-author this. I've got the time and significant experience with PWAs. Just let me know. Cheers.

hemanth commented 4 years ago

Sure @logicalphase let us discuss, there was one more purpose who was interested too.

hemanth commented 4 years ago

For the July 13th checklist completion we have:

Author + Analysts: @hemanth Reviewers: @thepassle @jadjoubran @logicalphase @pearlbea

Reviewers: please confirm with a thumbs up to this comment, if you are still interested in reviewing, thank you!

hemanth commented 4 years ago

Meanwhile, for July 20th item on the checklist, I have added a potential chapter outline please have a look and leave a comment on the document or in the issue here, thank you!

//cc Reviewers: @thepassle @jadjoubran @logicalphase @pearlbea

logicalphase commented 4 years ago

Outstanding @hemanth I'm looking forward to working with you and the team. Will review your materials, and reply. Please let me know if you need anything else right now.

foxdavidj commented 4 years ago

Hey @hemanth, looks like things are moving along pretty smoothly. Is there anything you need from me to keep things moving forward, and have the chapter outline and metrics settled on by the end of the week?

Also, can you remind your team to properly add and credit themselves in your chapter's Google Doc?

thepassle commented 4 years ago

Also, can you remind your team to properly add and credit themselves in your chapter's Google Doc?

Looks like I need some permissions — I requested them in the google doc. I imagine this'll be true for the other reviewers, too. 🙂 Fyi

foxdavidj commented 4 years ago

You should get access shortly :)

tungpatrick commented 4 years ago

Hey @obto @hemanth! I would love to help as an analyst, however, I am actually a fairly new 'analyst' and am very new to HTTP Archive. Will this be a problem? I definitely think this is quite grand, so I wanted to ask before I partake in such a role.

tpiros commented 4 years ago

Since I'm involved in the Media and Jamstack chapters, I will withdraw from this one if that's OK :) (or at least I'll unsubscribe if I'm not needed anyway :) )

tungpatrick commented 4 years ago

Hey @obto @rviscomi! I noticed that I've been added as an analyst for this chapter. Thank you for allowing me to help. However, as I have mentioned, I am very new and am not really familiar with the HTTP Archive dataset. I find this to be a great experience for me, but I'm not sure if I'm qualified on my own as the analyst for this chapter. With that being said, I am very grateful for having this opportunity, so I was wondering if there would be any way for me to learn what's in the dataset or allow me to explore it without incurring too much cost (so that I can help to my best effort).

hemanth commented 4 years ago

Hey @tungpatrick you can give it a shot and you aren't alone, we can take assistance from folks who have played this role before. If you are willing I can add you to the list. The Analysts'-Guide talks in detail about it, also have a look at how it was done in the previous year.

Given that @logicalphase has experience in illustration and content writing, I would vouch for him to be the co-author, as we look forward to some awesome SVG animations for our chapters! 😉

logicalphase commented 4 years ago

I'm glad there's no pressure, @hemanth 😁

gokulkrishh commented 4 years ago

@hemanth @rviscomi I can help in reviewing the content.

jaisanth commented 4 years ago

@hemanth I'm interested in being a reviewer, not sure if this is already filled up :-)

tungpatrick commented 4 years ago

Hey @hemanth! I think I have actually already been added as an analyst for this chapter. I am definitely willing and would love to help. I just don't want to be a burden to the team haha. I have taken a quick look at the 2019 Almanac, which is a main reason why I have wanted to volunteer. Hopefully after getting a chance to explore the dataset, I'll be more helpful. Again, thanks for letting me participate!

hemanth commented 4 years ago

@gokulkrishh and @jaisanth Added your names to the reviewers' list, thanks for volunteering!

@tungpatrick Thank you! Also, please have a look at the milestones and feel free to ask your quires on this thread.

foxdavidj commented 4 years ago

@tungpatrick we're excited to have you! @bazzadp just made a great post about the best way to get started i think you'd find very helpful https://github.com/HTTPArchive/almanac.httparchive.org/issues/914#issuecomment-659205330

Do make sure to join the #web-almanac slack so Paul can invite you to the Analysts channel. It's a great place to ask any questions you may have :)

logicalphase commented 4 years ago

@hemanth Just in time:, Chrome 84 added a slew of animations support. Example: https://developers.google.com/web/updates/2020/07/nic84#web-animations

cc: @obto

tungpatrick commented 4 years ago

Hey @hemanth! I sent a quick DM on slack, but I feel like that might not have been the best place to reach out to you, so I'll type here. I'm not really sure how to proceed at the moment. So I'm not sure about the procedure for the web almanac as I'm fairly new to this whole thing. I can see that you have drafted up an outline for the chapter, so yay! But my question now is... am I supposed to generate a whole bunch of queries that you can use (given the outline)? Or would there be a list of ‘Metrics’ that I should look into to see if I can query it?

Oh, and another quick question! How much of last year's queries do you think can be reused this year? Do you think we can just 'copy & paste' (with some modifications) from last year?

Thank you so much in advance!

rviscomi commented 4 years ago

@tungpatrick once the outline has been finalized, the next step is for you to work with the authors to understand what metrics they need to substantiate the content. Ask questions in the doc if any content in the outline is unclear. Use the "Metrics" section of the doc to compile the list of metrics that would need to be queried. Most importantly, identify which metrics are candidates for custom metrics, which collect the data at runtime using JS APIs as opposed to statically analyzing the HTML responses in BigQuery (which is much more expensive). Any custom metrics needed for this chapter must be implemented by the 27th so that they're in place before the August 1 crawl begins.

For more info about this phase of the chapter, see the Chapter Lifecycle doc.

tungpatrick commented 4 years ago

@rviscomi Thank you for the clarification. Unfortunately, I think it would be a little difficult for me to implement any custom metrics, but I'll reach out for help if needed.

@hemanth I'd love to schedule a time with you to understand what metrics you'd like to use for the content of this chapter.

hemanth commented 4 years ago

Sorry @tungpatrick missed your ping on slack, sure let us catch up and discuss further.

rviscomi commented 4 years ago

@tungpatrick that's ok, let's see if any custom metrics are needed (maybe none) and I'm sure we could find another analyst to help if needed. Please ping me or @paulcalvano if that's the case.

foxdavidj commented 4 years ago

@tungpatrick @hemanth If there are any custom metrics you need, let me know by EOD tomorrow. I'm working on implementing a large amount of them right now (PR here).

hemanth commented 4 years ago

Sure, thanks @obto!

rviscomi commented 4 years ago

@logicalphase @pearlbea @gokulkrishh @jaisanth have you all had a chance to contribute to and review the planning doc? Please request edit access to make sure you can comment and others can @ you.

logicalphase commented 4 years ago

@hemanth @rviscomi @obto I've reviewed the planning doc. Looks good. I think for metrics I like usage [I've used builtwith], I'm wondering how PWAs get tracked,? Manifest? I've got a list of a few sources I've been researching through, and comfortable with any of the background sub chapters as listed in the planning outline. Should we split them out or assign to me what works best for you?

gokulkrishh commented 4 years ago

@rviscomi @hemanth Sent a request for the edit access. I Will be reviewing the outline soon.

foxdavidj commented 4 years ago

@hemanth @tungpatrick for the two milestones overdue on July 27 could you check the boxes if:

Keeping the milestone checklist up to date helps us to see at a glance how all of the chapters are progressing. Thanks for helping us to stay on schedule!

tungpatrick commented 4 years ago

Hey team! Due to some personal reasons, I have removed myself from the web almanac and from the team. Sorry in advance if this causes any problems to the team. Thanks for letting me have the chance to volunteer!

rviscomi commented 4 years ago

Sorry to see you go @tungpatrick but completely understand.

@hemanth are you still interested in being an analyst for this chapter? I also see @thepassle listed in the doc as an analyst but I don't see any discussion of that happening in this issue, so not sure if that's intentional. Could you update https://github.com/HTTPArchive/almanac.httparchive.org/issues/909#issue-646592503 and the doc with the correct analyst assignments?

Also as @obto mentioned, this chapter is overdue on a couple of milestones, so it'd be great to get these sorted out ASAP to stay on schedule. Thanks!

thepassle commented 4 years ago

Not sure who put me there as analyst, I signed up as a reviewer 🙂

hemanth commented 4 years ago

AFAIR @thepassle wasn't on the analyst list.

@tungpatrick had singed up earlier, but looks like we were lost in translation.

@obto Looks like we have slipped a bit on the deadlines? Also, for custom metrics, we should be able to get insights on the APIs that are being used in the serviceworkers that we parse?

@rviscomi look like, is it too late already?

It would be great if @rviscomi and @obto and the authors could get on a call.

hemanth commented 4 years ago

We need to decided on this sooner, maybe should I add a label requesting analyst?

@rviscomi @obto We had a slack channel for PWA discussions, right?

rviscomi commented 4 years ago

We need to decided on this sooner, maybe should I add a label requesting analyst?

Yes, good idea. You can also reach out on the #web-almanac-analysts to see if anyone is available. Or if you're interested in taking on the role, some of us can help with the onboarding.

@rviscomi @obto We had a slack channel for PWA discussions, right?

Not a channel but I did start a group chat to discuss this: https://httparchive.slack.com/archives/G0181NNKEJH/p1596215461000600

foxdavidj commented 3 years ago

I've updated the chapter metadata at the top of this issue to link to the public spreadsheet that will be used for this chapter's query results. The sheet serves 3 purposes:

  1. Enable authors/reviewers to analyze the results for each metric without running the queries themselves
  2. Generate data visualizations to be embedded in the chapter
  3. Serve as a public audit trail of this chapter's data collection/analysis, linked from the chapter footer
hemanth commented 3 years ago

@obto @rviscomi I would vote to tick the checkboxes with the data we have.

Me and @logicalphase should probably get started with the content we have a month and couples of days to go and about 5-6 chapters to cover.

tunetheweb commented 3 years ago

@hemanth / @logicalphase I can probably help out and run some of the queries for you.

However the Metrics section of [your document] is looking very bare!

What would be great is if you could review the spreadsheet from last year and let me know: 1) Which tabs you think would be useful to have the stats rerun for this year. Maybe all of them? 2) Which stats are missing and some detail of what exactly you’re looking for and I can let you know if they are feasible.

@tungpatrick I don’t know if your situation has changed, or if you’d like to be involved again as a co-analyst now you have someone to help guide you through this? If so let us know as happy to help you!

hemanth commented 3 years ago

Thanks for pitching in @bazzadp!

Which tabs you think would be useful to have the stats rerun for this year. Maybe all of them?

Yes.

Which stats are missing and some detail of what exactly you’re looking for and I can let you know if they are feasible.

We were looking into way to figure out if we can pull stats about certain feature set and their usage, like BackgroundSync, PeriodicSync, offline analytics and likes.

I noticed @jaisanth is an analyst for JavaScript and maybe he can help us too?

tunetheweb commented 3 years ago

We were looking into way to figure out if we can pull stats about certain feature set and their usage, like BackgroundSync, PeriodicSync, offline analytics and likes.

Some of these may be tracked in the blink_usage table. This is a list of 4,163 features that Chrome records websites using as they are crawled including PeriodicBackgroundSyncRegister and PeriodicBackgroundSync. Full list of tracked features here if you can spot any others that might be useful.

Alternatively, one of the stats they ran last year was 11.06 which scanned the first HTML page (e.g. index.html in case of inline <script> tags) and also any Script files for beforeinstallprompt to see who was using that. We could do something similar to search for sync.register and the like. It's not as accurate and prone to false positives, and even missing some stats if you don't have a specific phrase to search for, but it does allow use to hunt for things not tracked. The other problem with that is it's incredibly expensive to query as it's 24TB of data as it's basically scanning all the response bodies for all files. @rviscomi would it be possible to create an almanac.response_bodies_scripts table of just the initial HTML (incase of inline <script> tags) and script resources to cut this down as much as possible? Or maybe should have almanac.response_bodies_firsthtml and almanac.response_bodies_scripts? If not then we should try and query everything we need in one go, rather than in multiple queries to reduce usage.

As to offline analytics we couldn't test if they are fired (as the crawler doesn't run as offline) but using either of above methods could search for them. Can't see anything in the bleak_features table for that myself so think it would be searching for things like workbox-google-analytics so would need help as to what search phrases to look for.

tunetheweb commented 3 years ago

BTW HTTPArchive recently launched a capabilities dashboard including stats you might be interested in like:

Just as long as we don't repeat too much of the Capabilities chapter.

hemanth commented 3 years ago

@bazzadp the legends and graphs seems bit out of sync? [Or the legend isn't considering the second decimal point]

image

This means 0.00003%?

If offline analytics stats are tough, we can skip that for now.

Mainly according to the draft if we have stats for the below it must be fine:

  1. If we can parse the mainfest.json and get the stats on all the possible attributes.
  2. Notification permission prompt response
  3. Background Sync
  4. Periodic Sync
  5. Background Fetch

Thanks for pitching in!

tunetheweb commented 3 years ago

@bazzadp the legends and graphs seems bit out of sync? [Or the legend isn't considering the second decimal point]

Not sure what you mean? Do you mean because the legend is only going to 1 decimal place whereas the numbers (as shown by the axis) are so small that means it shows 0.0?

When we looked last year only 0.44% of pages installed a service worker - though because some big names did, that meant that 15% of page views used a service worker. Would be great to have some examples of big names to explain that discrepancy more this year!

Looking at PeriodicBackgroundSync and PeriodicBackgroundSyncRegister only 1 site (https://uhcitp.in/) uses it according to the blink_usage table which is why it's just 0.00003%. Not sure that's right to be honest! Expected it to be small but just 1 site seems too small to me. Will have another look via regexing the JavaScript once we have the stats tables for that (Rick's working on these for us).

We're basically using the methodology described here: https://medium.com/dev-channel/progressive-web-apps-in-the-http-archive-614d4bcf81fe. Thomas was one of the co-authors of last years chapter. Some interesting ideas for further research in there too so you should have a read of that.

hemanth commented 3 years ago

Not sure what you mean? Do you mean because the legend is only going to 1 decimal place whereas the numbers (as shown by the axis) are so small that means it shows 0.0?

Yes.

Expected it to be small but just 1 site seems too small to me. Me too, I have personally come across few more sties which use them!

Nice article there, I also noticed progressive_web_apps.web_app_manifests query.