HTTPArchive / almanac.httparchive.org

HTTP Archive's annual "State of the Web" report made by the web community
https://almanac.httparchive.org
Apache License 2.0
613 stars 170 forks source link

Ecommerce 2021 #2155

Closed rviscomi closed 2 years ago

rviscomi commented 3 years ago

Part III Chapter 17: Ecommerce

Ecommerce illustration

If you're interested in contributing to the Ecommerce chapter of the 2021 Web Almanac, please reply to this issue and indicate which role or roles best fit your interest and availability: author, reviewer, analyst, and/or editor.

Content team

Lead Authors Reviewers Analysts Editors Coordinator
@bobbyshaw @bobbyshaw @rockeynebhwani @fili @samdutton @alankent @soulcorrosion @rrajiv @shantsis @logicalphase
Expand for more information about each role - The **[content team lead](https://github.com/HTTPArchive/almanac.httparchive.org/wiki/Content-Team-Leads'-Guide)** is the chapter owner and responsible for setting the scope of the chapter and managing contributors' day-to-day progress. - **[Authors](https://github.com/HTTPArchive/almanac.httparchive.org/wiki/Authors'-Guide)** are subject matter experts and lead the content direction for each chapter. Chapters typically have one or two authors. Authors are responsible for planning the outline of the chapter, analyzing stats and trends, and writing the annual report. - **[Reviewers](https://github.com/HTTPArchive/almanac.httparchive.org/wiki/Reviewers'-Guide)** are also subject matter experts and assist authors with technical reviews during the planning, analyzing, and writing phases. - **[Analysts](https://github.com/HTTPArchive/almanac.httparchive.org/wiki/Analysts'-Guide)** are responsible for researching the stats and trends used throughout the Almanac. Analysts work closely with authors and reviewers during the planning phase to give direction on the types of stats that are possible from the dataset, and during the analyzing/writing phases to ensure that the stats are used correctly. - **[Editors](https://github.com/HTTPArchive/almanac.httparchive.org/wiki/Editors'-Guide)** are technical writers who have a penchant for both technical and non-technical content correctness. Editors have a mastery of the English language and work closely with authors to help wordsmith content and ensure that everything fits together as a cohesive unit. - The **[section coordinator](https://github.com/HTTPArchive/almanac.httparchive.org/wiki/Section-Leads'-Guide)** is the overall owner for all chapters within a section like "User Experience" or "Page Content" and helps to keep each chapter on schedule. _Note: The time commitment for each role varies by the chapter's scope and complexity as well as the number of contributors._ For an overview of how the roles work together at each phase of the project, see the [Chapter Lifecycle](https://github.com/HTTPArchive/almanac.httparchive.org/wiki/Chapter-Lifecycle) doc.

Milestone checklist

0. Form the content team

1. Plan content

2. Gather data

3. Validate results

4. Draft content

5. Publication

Chapter resources

Refer to these 2021 Ecommerce resources throughout the content creation process:

📄 Google Docs for outlining and drafting content 🔍 SQL files for committing the queries used during analysis 📊 Google Sheets for saving the results of queries 📝 Markdown file for publishing content and managing public metadata

rrajiv commented 3 years ago

@bobbyshaw - I got all of them except for these 3. Can you let me know where you saw them?. I did a dump of all the categories and I don't see them

Top “reviews” technology category Top “Translation” technology category Top “Buy now / pay later” technology category

For hreflang, Iam able to identify sites that use it using the query below (sample set only). What are you looking to collect from here?

SELECT
  DISTINCT page,
FROM
  `httparchive.sample_data.response_bodies_*` rb
WHERE
  REGEXP_CONTAINS(body, "hreflang")
  AND EXISTS (
  SELECT
    url,
    category
  FROM
    `httparchive.sample_data.technologies_*` ht
  WHERE
    rb.page = ht.url
    AND ht.category = "Ecommerce" )
GROUP BY
  page

Same with the CSP, I haven't yet figured out the query here. Are you trying to report the % of Ecommerce sites that use the CSP header?

There were a couple of other minor potential discussion points in the outline. Are either of these feasible? Use of link hrelang tags (to indicate international ecommerce) Use of content-security-policy header set? (report only/enforce)

I might doing something wrong but when I ran that query without the category filter, it took so long (5 min+). I wonder if its because of the number of categories. Even just ecommerce as a filter takes ~1 min.

I think this would be enough to be a point in the article. Basically this query without the category filter would tell us whether ecommerce sites are on average over or under performing the rest of the web.

bobbyshaw commented 3 years ago

@rrajiv Hey, no problem. I was looking here:

Perhaps they're only available in a newer version of wappalyzer?

For hreflang,[...] What are you looking to collect from here?

In the first instance, a % of sites that contains a hreflang tag. A nice to have a count per number of hreflang tags, e.g. X have 0 hreflang tag, Y have 1 hreflang, Z have 2 hreflang and so on.

I found a couple of older queries related to hreflang's if it helps at all:

Are you trying to report the % of Ecommerce sites that use the CSP header?

Yes, a statistic on the % that have the "Content-Security-Policy" or "Content-Security-Policy-Report-Only" header would be great.

I might doing something wrong but when I ran that query without the category filter, it took so long (5 min+)

Ok, don't worry about that query, that's not worth it :)

rrajiv commented 3 years ago

@bobbyshaw - I see Reviews and Translations appear in httparchive.technologies.2021_08_01_*. I don't see Buy now pay later perhaps because that category hasn't been seen in the wild yet?. httparchive.technologies.2021_07_01_* does not have these 3 new categories.

I'll work on the other 2 queries sometime this week.

rockeynebhwani commented 3 years ago

@rrajiv / @bobbyshaw - I added 'Reviews' / 'Translation' / 'Buy now pay later' / 'Loyalty & Rewards' category very recently in Wappalyzer. Even if you see some data, it will be very limited. I think we should look at these next year.

rockeynebhwani commented 3 years ago

@rrajiv - Topic of hreflange was covered last year in SEO chapter - https://almanac.httparchive.org/en/2020/seo#hreflang. You should be able to find relevant queries from last year and filter just for Ecommerce.

I believe these are the queries you need -

https://github.com/HTTPArchive/almanac.httparchive.org/blob/main/sql/2020/seo/pages_wpt_bodies_hreflang_by_device_and_http_header_value.sql https://github.com/HTTPArchive/almanac.httparchive.org/blob/main/sql/2020/seo/pages_wpt_bodies_hreflang_by_device_and_link_tag_value.sql

rrajiv commented 3 years ago

@rockeynebhwani and @bobbyshaw - thank you for the pointers.

I have added the following to the excel sheet

bobbyshaw commented 3 years ago

Great, thanks @rrajiv. I'll review the results spreadsheet and get started!

bobbyshaw commented 3 years ago

Hey, a quick update. I’m a bit behind on digesting the analysis and writing the draft but I’ve started in earnest yesteday. I hope to have something to review in a week’s time.

I’ve got a couple of questions so far. Would you be able to help @rrajiv? I appreciate you said you would be travelling so I don’t expect a quick response.

There’s a sharp rise in ecommerce platforms (215 vs 145 last year). I'd expect some rise as more platforms are added to Wappalyzer but there are a number of technologies in there that I wouldn’t consider to be ecommerce. I’ve checked Wappalyzer signatures but they didn’t seem to be in the ecommerce category nor imply cart functionality. Do you know why that might be?

Examples of anomalies in the top vendors tab are:

One other platform I’m not sure about is 1C-Bitrix. It’s a Russian software suite that has an ecommerce product within it but not as a core component to it. We included it last year so I’d be interested in your thoughts @rockeynebhwani. Is it fair to include it in the top 10 list when it’s likely that actually, a much smaller proportion of all 1C-Bitrix sites are ecommerce? I guess we can't discount or adjust its position as any adjustment would be based on an assumption. I think you had a similar problem in the past with Wix though that seems to have specific ecommerce signatures now.

I’ve also started to read through the figure guide on how to create charts but I may need some help. In the first instance, I’ve added within the draft at this stage. I’ll focus on the draft itself for now and come back to the figures afterwards.

rockeynebhwani commented 3 years ago

@bobbyshaw - I won't worry about sharp incrase in number of ecommerce platforms. I personally would have contributed 30 different platfrorms to Wappalyzer since last year. As of today, Wappalyzer is tracking 264 different ecommerce platforms. You can see latest count on this page - https://www.wappalyzer.com/technologies/ecommerce.

I personally observed that technologies analysis for CMS/ecommerce is more skewed towards North America. In last 12 months, I added many different platforms from Korea / Latin America / India and other countries from Europe. That may be one of the reason.

Regarding technologies like Loox, Omnisend etc, it's a problem due to open source nature of Wappalyzer. Anybody can add a technolgoy and assigned to ecommerce category where they can't find another appropriate category. For example, Loox is an app for reviews but there was no category for reviews till very recently so contributors by default choose Loox. However, in many cases, this resolves itself and new categories are introduced over time (For example - Loox has been categorised under 'Reviews' category now). I checked all examples in your comment and none of them are not categorised under ecommerce. You can search for these on link I shared above. You are looking at latest Wappalyzer signatures on GitHub whereas queries output are from July-2021 and these were updated after July-2021. For the purpose of top 10 platforms, I suggest you ignore these.

rockeynebhwani commented 3 years ago

@bobbyshaw - Regarding 1C-Bitrix, I am not very familiar with this platform and I didn't realise this last year. Yes.. it's same issue as 'Wix'. This year, I was able to get in touch with 'Wix' team and make changes to Wappalyzer to split Wix detection as 'Wix' (CMS) and 'Wix commerce'. We should do the same with 1C-Bitrix if there is a way to identify. For now, I suggest you add this as a caveat as I did for 'Wix' last year.

@bobbyshaw - This is the most recent discussion I could find on 1C-Bitrix. As of now, we don't know how to differentiate between CMS and commerce sites. - https://github.com/AliasIO/wappalyzer/pull/4157

bobbyshaw commented 3 years ago

Thanks @rockeynebhwani. That's really helpful.

rrajiv commented 3 years ago

@bobbyshaw - Iam still on the road but if you want to let me know the questions, I can answer when possible.

If you need charts let me know the tabs and I can try it from the iPad.

rockeynebhwani commented 3 years ago

@bobbyshaw - I will also be on the move for next 4 weeks but I can try to help with the charts. @rviscomi - I don't have edit access on results sheet. Can you please grant me 'edit access'?

rviscomi commented 3 years ago

@rockeynebhwani can you hit "Request edit access"?

rockeynebhwani commented 3 years ago

@rviscomi - I already did couple of days ago but have done again now.. let me know if you don't receive my request

rockeynebhwani commented 3 years ago

@bobbyshaw - I have created all charts in results sheet. Please have a look and let me know if I missed anything or if anything is not clear.

bobbyshaw commented 3 years ago

That's incredible, thanks @rockeynebhwani 🤩

soulcorrosion commented 2 years ago

@bobbyshaw do you think I can start reading for the review?

bobbyshaw commented 2 years ago

Thanks for your patience team. I can now offer my very rough first draft for review. Given that days are passing quickly feel free to review at your earliest convenience and I will respond to each as and when I can.

@rockeynebhwani @fili @samdutton @alankent @soulcorrosion (@shantsis I'm not sure the appropriate time for an editor to get involved but tagging you as a heads-up anyway).

Overall, I think we’ve found the ecommerce landscape to be very similar to last year. However, we do have a couple of new discussion opportunities, particularly with the ranking data. There was some rapid growth around Q2-3 last year when COVID hit but the growth rate appears to have returned to pre-pandemic levels.

In terms of what we’ve covered. We came up with so many topics during the outline, which is great. It’s fair to say that we ddn’t get through them all! There was some that we just didn’t get around to doing in the depth that was suggested, e.g. SEO, and others that weren’t practical because of lack of data, e.g very few personalisation technologies.

In terms of limitations, I think going forward headless sites are going to cause us the most trouble. Even in this year’s edition, it would have been nice to have more to say on this trend. While I’m sure a lot fewer people are going headless than the buzz would suggest, the easiest and sometimes only way for us to detect a platform is through its frontend markup choices.

Over the next week, my plan is to:

For any other questions or longer discussions not suited to here or the Google Doc, you're welcome to find me #web-almanac-ecommerce slack channel

fili commented 2 years ago

Thanks for the update. I will have a look at it in the coming week and get back to you.

On Fri, Nov 5, 2021, 18:17 Tom Robertshaw @.***> wrote:

Thanks for your patience team. I can now offer my very rough first draft https://docs.google.com/document/d/1LQjpsaWx-5ZtHQGRnHlPnekkxuap50KzJZJTIaSX4B4/edit#heading=h.l58oy8wsputh for review. Given that days are passing quickly feel free to review at your earliest convenience and I will respond to each as and when I can.

@rockeynebhwani https://github.com/rockeynebhwani @fili https://github.com/fili @samdutton https://github.com/samdutton @alankent https://github.com/alankent @soulcorrosion https://github.com/soulcorrosion @.*** https://github.com/shantsis I'm not sure the appropriate time for an editor to get involved but tagging you as a heads-up anyway).

Overall, I think we’ve found the ecommerce landscape to be very similar to last year. However, we do have a couple of new discussion opportunities, particularly with the ranking data. There was some rapid growth around Q2-3 last year when COVID hit but the growth rate appears to have returned to pre-pandemic levels.

In terms of what we’ve covered. We came up with so many topics during the outline, which is great. It’s fair to say that we ddn’t get through them all! There was some that we just didn’t get around to doing in the depth that was suggested, e.g. SEO, and others that weren’t practical because of lack of data, e.g very few personalisation technologies.

In terms of limitations, I think going forward headless sites are going to cause us the most trouble. Even in this year’s edition, it would have been nice to have more to say on this trend. While I’m sure a lot fewer people are going headless than the buzz would suggest, the easiest and sometimes only way for us to detect a platform is through its frontend markup choices.

Over the next week, my plan is to:

  • Compare to last years for any further commentary that could be made.
  • Respond to all of your feedback and correction and update as appropriate.
  • Re-read the author guide and style guide and re-draft with that in mind.
  • Read through the next steps for converting to markdown and get started

For any other questions or longer discussions not suited to here or the Google Doc https://docs.google.com/document/d/1LQjpsaWx-5ZtHQGRnHlPnekkxuap50KzJZJTIaSX4B4/edit#heading=h.l58oy8wsputh, you're welcome to find me #web-almanac-ecommerce slack channel

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/HTTPArchive/almanac.httparchive.org/issues/2155#issuecomment-962073195, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAA4EP57EKPEQ5IOPPUZDKTUKQGQZANCNFSM43UFNOMA .

alankent commented 2 years ago

Just wanted to say I think the start is coming together well @bobbyshaw (and others)! The end still needs work (not finished). I finished a complete pass through. Feel free to mention me on this thread again later if you want me to make another pass.

shantsis commented 2 years ago

I did a first pass through the doc. Main thing to be careful of is use of past tense (for our analytics) vs present (current state of web), and use of British vs US spelling https://github.com/HTTPArchive/almanac.httparchive.org/wiki/Editors'-Guide

shantsis commented 2 years ago

The other thing to note is that there are a lot of charts using only green bars (not desktop related) that are leading to poor contrast. @tunetheweb suggests we can either use the dark gray color instead for both for bar and label, or just the label

Screen Shot 2021-11-07 at 4 08 28 PM
tunetheweb commented 2 years ago

Other option is to use black labels. But find they look better as inside labels (something just seems "off" when they are outside labels for green bars):

Example chart with green bars and black inside labels

bobbyshaw commented 2 years ago

Thanks, everyone. I've incorporated all feedback, including the chart suggestions and Americani~s~zation 🙂

I'm going to take a break and come back in a few days to start the process of converting to markdown. I'll do my best to incorporate any final comments made during that period.

Thanks, again.

rviscomi commented 2 years ago

@bobbyshaw @rockeynebhwani @fili @samdutton @alankent @soulcorrosion @rrajiv @shantsis

🎉 This chapter is fully written, reviewed, edited, and ready to be launched on Wednesday! Thank you to all of the contributors who put in the time and effort to make this a great chapter.

When you get 5 minutes, I'd really appreciate if you could fill out our contributor survey to tell us (the project leads) about your experience. It's super helpful to hear what went well or what could be improved for next time. 🙏

Congratulations and thank you all again. I'm excited for this to launch soon!