HTTPArchive / almanac.httparchive.org

HTTP Archive's annual "State of the Web" report made by the web community
https://almanac.httparchive.org
Apache License 2.0
611 stars 168 forks source link

PWA 2021 #2153

Closed rviscomi closed 2 years ago

rviscomi commented 3 years ago

Part II Chapter 15: PWA

PWA illustration

If you're interested in contributing to the PWA chapter of the 2021 Web Almanac, please reply to this issue and indicate which role or roles best fit your interest and availability: author, reviewer, analyst, and/or editor.

Content team

Lead Authors Reviewers Analysts Editors Coordinator
@demianrenzulli @demianrenzulli @webmaxru @Schweinepriester @thepassle @hemanth @tropicadri @andreban @jeffposnick @tunetheweb @demianrenzulli @rviscomi @obto
Expand for more information about each role - The **[content team lead](https://github.com/HTTPArchive/almanac.httparchive.org/wiki/Content-Team-Leads'-Guide)** is the chapter owner and responsible for setting the scope of the chapter and managing contributors' day-to-day progress. - **[Authors](https://github.com/HTTPArchive/almanac.httparchive.org/wiki/Authors'-Guide)** are subject matter experts and lead the content direction for each chapter. Chapters typically have one or two authors. Authors are responsible for planning the outline of the chapter, analyzing stats and trends, and writing the annual report. - **[Reviewers](https://github.com/HTTPArchive/almanac.httparchive.org/wiki/Reviewers'-Guide)** are also subject matter experts and assist authors with technical reviews during the planning, analyzing, and writing phases. - **[Analysts](https://github.com/HTTPArchive/almanac.httparchive.org/wiki/Analysts'-Guide)** are responsible for researching the stats and trends used throughout the Almanac. Analysts work closely with authors and reviewers during the planning phase to give direction on the types of stats that are possible from the dataset, and during the analyzing/writing phases to ensure that the stats are used correctly. - **[Editors](https://github.com/HTTPArchive/almanac.httparchive.org/wiki/Editors'-Guide)** are technical writers who have a penchant for both technical and non-technical content correctness. Editors have a mastery of the English language and work closely with authors to help wordsmith content and ensure that everything fits together as a cohesive unit. - The **[section coordinator](https://github.com/HTTPArchive/almanac.httparchive.org/wiki/Section-Leads'-Guide)** is the overall owner for all chapters within a section like "User Experience" or "Page Content" and helps to keep each chapter on schedule. _Note: The time commitment for each role varies by the chapter's scope and complexity as well as the number of contributors._ For an overview of how the roles work together at each phase of the project, see the [Chapter Lifecycle](https://github.com/HTTPArchive/almanac.httparchive.org/wiki/Chapter-Lifecycle) doc.

Milestone checklist

0. Form the content team

1. Plan content

2. Gather data

3. Validate results

4. Draft content

5. Publication

Chapter resources

Refer to these 2021 PWA resources throughout the content creation process:

πŸ“„ Google Docs for outlining and drafting content πŸ” SQL files for committing the queries used during analysis πŸ“Š Google Sheets for saving the results of queries πŸ“ Markdown file for publishing content and managing public metadata

webmaxru commented 3 years ago

It's my pleasure to support PWA chapter as a reviewer!

rviscomi commented 3 years ago

Welcome @webmaxru!

Schweinepriester commented 3 years ago

If I may, I'd support PWA as a reviewer as well :)

demianrenzulli commented 3 years ago

I can work as an author in this one!

cc // @andreban @petele @b1tr0t

rviscomi commented 3 years ago

@demianrenzulli thanks for your interest in authoring this chapter! As the content team lead, you'll be responsible for the scope and direction of the chapter and keeping it on schedule. We automatically monitor the staffing and progress of each chapter based on the state of the initial comment so please keep that updated as you add new contributors and meet each milestone.

We've created a Google Doc for this chapter, which you're encouraged to use to collaborate with the content team on the initial outline, metrics, and ultimately the final draft.

Next steps for this chapter are:

@obto will be the section coordinator for this chapter, so they'll be periodically checking in with you directly to make sure the chapter is staying on schedule. Reach out to them here in this issue if you have any questions about the process.

More information about the content team lead and author roles and responsibilities are available for reference in the wiki if needed.

To anyone else interested in contributing to this chapter, please comment below to join the team!

rviscomi commented 3 years ago

πŸ“Ÿ paging 2019/2020 contributors: @tomayac @jeffposnick @logicalphase @ahmadawais @jrharalson @hemanth @thepassle @jadjoubran @pearlbea @gokulkrishh @jaisanth @tunetheweb

Would any of you be interested to contribute to the 2021 chapter? This chapter could use your help with reviewing and/or analyzing. It'd be great to have your support!

thepassle commented 3 years ago

Yeah I'd be happy to help out as reviewer where possible :)

hemanth commented 3 years ago

Count me in for reviews, thank you!

demianrenzulli commented 3 years ago

Thanks for your interest folks!

It seems like @tunetheweb acted as analyst for the 2020 section.

Hi Barry! Nice to meet you. You're more than welcome to collaborate on that role once again, but if you can't make it this year, and we don't find another contributor willing to work as analyst, I would be happy to take that role as well. In that case, maybe we can touch base separately, so you can share some insights on your experience with the 2020 PWA section?

We might be able to reuse several of the 2020 queries, plus other ideas we have in mind.

demianrenzulli commented 3 years ago

For the rest of the reviewers: it's the first time I contribute, so I'm not sure how you interacted in the past, but I'll touch base with Rick and other contributors to see how we can get organized for this year's PWA section πŸ‘πŸ‘πŸ‘

tunetheweb commented 3 years ago

Hey @demianrenzulli , I'm one of the leads her for the Web Almanac project. I stepped in as the analyst last year as we were short, but wouldn't say PWAs are my area of expertise so would be great if someone else could take the mantle here. But definitely more than happy to help guide them and help out where I can.

A lot of the queries was based on @tomayac 's work from 2019, and in particular this blog post of his that he used as the basis of that first year's chapter. See also https://github.com/HTTPArchive/almanac.httparchive.org/issues/1258#issuecomment-689857902 for more info on regenerating that data for last year (we'll help get that data for you again assuming you want it). Not sure if you wanna chip in here @tomayac ? Though know you are involved in Capabilities chapter too.

I'll touch base with Rick and other contributors to see how we can get organized for this year's PWA section

Best thing is for all authors to open the draft word doc, request edit permission, and then start listing ideas of what metrics or sections you'd like to keep from 2019 and 2020, and what you'd like to introduce. Start to get a chapter outline based on that and comment and suggest things. The more ideas the better at the beginning, and then can see which ones are realistic and whittle it down. The earlier you start this, the less pressure you'll feel later :-)

And btw you can get links to the results sheets (and the queries) from 2019 and 2020 at the bottom of the chapters to see what data those authors had to play with, to help with that:

tomayac commented 3 years ago

A lot of the queries was based on @tomayac 's work from 2019, and in particular this blog post of his that he used as the based of that first year's chapter. See also #1258 (comment) for more info on regenerating that data for last year (we'll help get that data for you again assuming you want it). Not sure if you wanna chip in here @tomayac ? Though know you are involved in Capabilities chapter too.

I've signed up for capabilities and am reluctant to overcommit. More than happy to answer ad-hoc questions you may have, though.

demianrenzulli commented 3 years ago

Thanks a lot @tunetheweb!! If, based on previous experiences you think it's "doable" for someone to be both the author and analyst, I would be happy to collaborate on both sides.

As mentioned, this is my first time contributing, so yours, @tomayac and other folk's guidance, would be highly appreciated.

There are a bunch of resources here that I need to process before moving to the next step.

I'll keep this issue updated. If anyone has any thoughts to share, please, let me know!

rviscomi commented 3 years ago

It's definitely doable to be an author and analyst, you wouldn't be the first. Each role peaks in activity at different times during the project, so it should be manageable for a single person assuming you can set aside a few hours each week.

foxdavidj commented 3 years ago

Hey @demianrenzulli excited to work with you and the rest of the group this year on the Almanac. I'm your go to guy if you've got any questions or need help so don't hesitate to reach out to me on github, the Slack (@obto) or email (david@davidjfox.com).

Few first steps:

  1. All authors, reviewers and analysts should add themselves to the Google Doc in the format: Name (email@example.com). But looks like you've already got a good handle on that.
  2. I've added links within the doc to the previous years Google doc in case you'd like to mine it for ideas.
  3. Would love to set up a 30 minute Zoom call in the next couple weeks to kick-start the chapter planning and brainstorming process, and put some faces to the names of the people we'll all be working with this year. I'll reach out again later this week to find a time that works.

Excited to work with you all this year.

demianrenzulli commented 3 years ago

This is great @obto! Nice to meet you too.

My apologies for the delayed response, but we had a couple of things for I/O week, and some articles I've been working on for Web.dev.

I'm planning to start working on this by tomorrow. I'll make sure to also touch base with the rest of the collaborators.

We had some internal conversations already, and would be great to hear thoughts from the rest of the folks in this group.

I hope we can create a great PWA chapter this year!

thepassle commented 3 years ago

All authors, reviewers and analysts should add themselves to the Google Doc in the format: Name (email@example.com). But looks like you've already got a good handle on that.

Looks like I need some rights to be able to add myself there, I've requested them in the doc.

foxdavidj commented 3 years ago

How does Tuesday (May 25) 11a ET / 8a PT / 4p BST work for that 30m chat? Timezones here: https://www.timeanddate.com/worldclock/meetingdetails.html?year=2021&month=5&day=25&hour=15&min=0&sec=0&p1=64&p2=179&p3=224&p4=136&p5=37&iv=1800

@demianrenzulli @webmaxru @Schweinepriester @thepassle @hemanth

logicalphase commented 3 years ago

Good for me.

demianrenzulli commented 3 years ago

That time: next Tuesday 11ET, works for me! Thanks for organizing this @obto!

webmaxru commented 3 years ago

Works fine for me!

foxdavidj commented 3 years ago

@demianrenzulli @logicalphase can you update the Google doc with your email? I'll send the calendar invite to that email

demianrenzulli commented 3 years ago

@obto I just requested edit access with my Google account. At this point, since we don't have any other volunteer for analyst, I'll be signing up for that role as well, but if anyone is interested on taking on that role, that would be welcome.

demianrenzulli commented 3 years ago

Also, @andreban: would you like to sign up as reviewer by adding your details to the Google doc? We have a kick-off call on Tuesday, so, let us know if you want to participate.

foxdavidj commented 3 years ago

@demianrenzulli Added you to the analyst role, but we'll keep the help wanted: analysts tag and see if we cant find you one

demianrenzulli commented 3 years ago

That's great! I'd be happy to collaborate with analyst if you know about any that might be interested. This is my first time contributing to Web Almanac, so playing both the author and analyst role might be a bit too ambitious. In any case, from Ricks' comments it is totally doable, so I might be able to secure help from previous analysts to move it forward on my own πŸ‘

Schweinepriester commented 3 years ago

That time works for me as well! Thanks for the link, as I'm in CEST! :)

demianrenzulli commented 3 years ago

@andreban just confirmed he'll be acting as a reviewer. I'll add his details to the doc. @obto, please try to add him to the call as well.

logicalphase commented 3 years ago

I thought I had access to the Google doc, but just put a request for access. I'll update soon as I can. But it's john@logicalphase.com.

foxdavidj commented 3 years ago

Can do. Just updated the calendar invite to include everyone who's requested access to the google doc so far

demianrenzulli commented 3 years ago

I've just added @tropicadri to the doc and I see she has been already added to Tuesday's sync.

I'm not very sure of what would be the main topic of discussion for Tuesday, but, just in case, I found it useful to read the 2019 and 2020 editions of the Web Almanac to have an idea of what things were covered in the past. If you have the time, it might be a good idea to do it before the call to have all the context.

Here are also the 2019 and 2020 queries if you want to take a look.

I added a Team's Notes section in the doc, where you can find some ideas I and others have been discussing recently.

Please, feel free to add any ideas or comments you might have as well! It would really help to count with different perspectives around these topics.

See you on Tuesday!

foxdavidj commented 3 years ago

@demianrenzulli Just sent out an agenda for the call on Tuesday. Looking forward to meeting you all πŸ˜„

foxdavidj commented 3 years ago

@demianrenzulli Can you tick the first milestone checkbox above 0. Form the content team? Keeping these milestones up to date helps to give me an overview on how the Almanac as a whole is coming together

demianrenzulli commented 3 years ago

Just did. Thanks for the reminder @obto!

foxdavidj commented 3 years ago

@demianrenzulli Here's last years github issue, where you can find all the data they used https://github.com/HTTPArchive/almanac.httparchive.org/issues/909

tunetheweb commented 3 years ago

Hey @demianrenzulli, for the last two years, for a lot of the analysis of this chapter we basically pulled all the service worker js to a separate httparchive.almanac.service_workers table and then did regexs on that (e.g. events query). We also have a lot of queries that join to this table to limit the queries to "pwa" sites.

This is a bit of an oddity to how the rest of the Web Almanac works to be honest. I'd like to see if we could move away from it this year and drop the need of the sevriceworkers and manifests tables completely.

My thoughts would be to have a set of PWA custom metrics, including the following information:

Plus anything else you can think of!

@rviscomi has already made a great start on this with a new pwa.js custom metric so we'd just need to expand on that. That finds the service worker URL and searches it for Workbox methods (to replace the workbox query we used last year) so that's a great guide for that last query. Not sure if Rick didn't realise we used this table more than just for Workbox queries, or just wanted that first use case addressed as an example to expand it to this.

Would you (or @rviscomi or @obto ?) be able to work on this? My PWA knowledge is pretty limited (hence all the questions above) and my JS ain't the greatest either, but happy to answer any questions I can on this based on the analysis I did last year (which to be honest was mostly just rerunning @tomayac 's queries from 2019 and then adding the workbox one).

If we can get the pwa.js custom metric updated and merged before the 1st June crawl then we'd have a practice run before the main July run we're planning on using for the 2021 Web Almanac but realise that's probably a bit of a tight turnaround!

tomayac commented 3 years ago

Parse the Service Worker URL for any events to replace the events query query. Is this sufficient or can a service worker URL load other dependencies? The way we've done it in the past has always struck me as a bit brittle because of that.

Service workers can importScripts(), so strictly speaking you'd have to consider those.

demianrenzulli commented 3 years ago

Adding @jeffposnick, who might be able to validate my answers around service workers. Other folks in this issue, feel free to cime in as well:

  1. Identify whether this is a SW worker page (is checking for the presence of a navigator.serviceWorker.register line either in HTML or any of the JS resources the best way of doing this or is there something cleverer?) - true/false.

I'm not aware of any other API to register service workers., so It seems like checking for that one would cover most scenarios. I see some cases where this might not work though (for example, when using Workbox Window) and calling: const wb = new Workbox('/sw.js'); wb.register();. I assume that the library calls navigator.serviceWorker.register internally, so, if we can inspect calls made inside libraries, that would work as well?

  1. Can you register SWs for other domains (e.g. iframes or other third-party loads)? If so would be good to track 1st party/3rd party too.

This article explains how to use that technique to register service workers for 3rd-party origins using iframes. I believe that's also what the amp-install-serviceworker component used to do internally, in order to allow AMP Pages served from the Google AMP Cache to register the service worker that belongs to the publisher's origin.

In any case, I don't think this works across browsers (I'm pretty sure that it doesn't work in Safari). With that said, I don't know how frequent this technique might be.

  1. The Service Worker URL from the navigator.serviceWorker.register line, where true (could there be multiple service workers per page?).

There can be, at most, one service worker controlling a given page at the time, but I believe that there are no restrictions for a page to have multiple calls to navigator.serviceWorker.register, as long as it indicates for which scope that service worker has to be registered.

  1. Identify whether this is has a manifest (presence of a <link rel="manifest"...> in the main HTML document I presume?) - true/false.

Detecting the presence of the Manifest sounds like a good way to check if sites could potentially be "installable". I'm seeing in 2020 there was also a Lighthouse query to detect the % of installable manifests.

Besides that, if you are thinking on using this as a way to detect if a site is a PWA, I think it really depends on which definition of what is a PWA we end up adopting for this year's Almanac. In 2020, the definition seem to have been strongly based on the presence of the service worker:

"The crux of a progressive web app is the service worker, which can be thought of as a proxy sitting between the browser and user. A service worker gives the developer total control over the network, rather than the network controlling the application."

  1. Parse the Service Worker URL for any events to replace the events query. Is this sufficient or can a service worker URL load other dependencies? The way we've done it in the past has always struck me as a bit brittle because of that. Any thoughts @tomayac ?

I think Thomas already replied this one. I don't have more to add to that question, but I'm leaving this one open for @jeffposnick to chime in.

@tunetheweb I hope I have understood the questions correctly and the motivations behind them. I would be happy to have a call with you and other folks in this issue if you think that's a good idea.

tunetheweb commented 3 years ago

Besides that, if you are thinking on using this as a way to detect if a site is a PWA, I think it really depends on which definition of what is a PWA we end up adopting for this year's Almanac. In 2020, the definition seem to have been strongly based on the presence of the service worker:

Yes, I think that's key! We did switch definitions last year:

What's your thoughts for this year?

demianrenzulli commented 3 years ago

Thanks for confirming that @tunetheweb! It's great to see that this was actually a topic that came out in the past. Coming up with a new definition of what a PWA is in 2021, is one of the main AIs that came up from our initial sync.

The problem I see with the 2020 definition: "Presence of Manifest and Service Worker", is that we have seen websites that have both things and don't do much with them. For example, they don't end up prompting the user to install the site and/or they can have an empty or very simple service worker that don't do much for the UX. On the other hand, there are sites that might not have either a manifest or a service worker, but they still provide and "App-like" UX that would probably be a better candidate to be called a "PWA" than those that don't.

One idea that I had to solve this problem would be to list properties that sites usually have to be considered a PWA (similar to what was done in the original definition), and then say that we'll be presenting stats of adoption of different features (service workers, manifests, etc.). If a site combines many of them, then the end experience might be much powerful, but some of them only use some of these features.

This granular way of presenting features seems to be aligned with what was done in 2020 (please, correct me if I'm wrong).

Now, if we really need to come up with a way of detecting PWAs out there for query purposes, I agree the 2020 definition is the best we have.

As said, we have just started discussing this, so any thoughts are highly welcome. Cheers.

tunetheweb commented 3 years ago

We also have access to all the Lighthouse PWA audits so could define the list based on some of those? Or the PWA score calculated by Lighthouse?

Only word of caution is that if we change the definition (again!) this year then we should make it clear in the chapter so people don't try to compare previous chapter. We called that out in last year's chapter:

Having a web app manifest does not necessarily indicate the site is a progressive web app, as they can exist independently of service worker usage. However, as we are interested in PWAs in this chapter, we have investigated only those manifests for sites where a service worker also exists. This is different than the approach taken in last year’s PWA chapter which looked at overall manifest usage, so you may notice some differences in results this year.

demianrenzulli commented 3 years ago

Taking Lighthouse into account would be a good idea, since it's the only tool I'm aware of that provides a badge to sites that considers a PWA according to a set of attributes. Now, since Lighthouse changes frequently, let me check internally how up-to-date this list might be. I believe that, if we update this section with the new checks that Lighthouse makes we should be fine, and that would be aligned with the new definition we want to provide in the Introduction section.

Regarding the stats about Manifests, I like the approach you've mentioned that was taken for last year edition. I wouldn't change that part.

@tunetheweb I hope I have provided some clarity, but we are still in the early stages. As said, if you are open to jump into a call next week, we could definitely go over all these points.

Just let me know.

demianrenzulli commented 3 years ago

@jeffposnick just confirmed that the responses provided before are all correct. He added this:

The one addition would be regarding