HTTPArchive / almanac.httparchive.org

HTTP Archive's annual "State of the Web" report made by the web community
https://almanac.httparchive.org
Apache License 2.0
610 stars 168 forks source link

Ecommerce 2020 #914

Closed foxdavidj closed 3 years ago

foxdavidj commented 4 years ago

Part III Chapter 16: Ecommerce

Content team

Authors Reviewers Analysts Draft Queries Results
@rockeynebhwani @jrharalson @drewzboto @alankent @jrharalson Doc *.sql Sheet

Content team lead: @rockeynebhwani

Welcome chapter contributors! You'll be using this issue throughout the chapter lifecycle to coordinate on the content planning, analysis, and writing stages.

The content team is made up of the following contributors:

New contributors: If you're interested in joining the content team for this chapter, just leave a comment below and the content team lead will loop you in.

Note: To ensure that you get notifications when tagged, you must be "watching" this repository.

Milestones

0. Form the content team

1. Plan content

2. Gather data

3. Validate results

4. Draft content

5. Publication

rviscomi commented 4 years ago

From @jrharalson:

I'd really like to help (co-authoring, reviewing, and/or extracting data) with the ECommerce chapter this year to highlight several critical platforms that were not "discovered" by the wappalyzer tool in 2019. IBM/HCL, SAP/Hybris, and SalesForce Commerce Cloud platforms are significant players in the Ecommerce platform space that I'd like to figure out to uncover via wapp + maybe Crux/HTTPArchive or some other combination.

rviscomi commented 4 years ago

@alankent @samdutton is there anyone you'd like to nominate who would be a good author for this year's chapter?

alankent commented 4 years ago

Sorry for delay. I would like to nominate @philwinkle.

rviscomi commented 4 years ago

Thanks @alankent!

@philwinkle, for context on the project, the role, and the timeline, see the resources in the top comment of this issue. Let us know if you'd be interested in contributing!

rockeynebhwani commented 4 years ago

@rviscomi - I am happy to be a co-author or reviewer for this chapter.

rviscomi commented 4 years ago

Great thanks @rockeynebhwani! Would you be interested in the content team lead role? You'd be responsible for keeping this chapter on schedule and coordinating with the other contributors. As of now nobody else has signed up, so to make this work we'll need at least another reviewer and analyst (unless you can also do the analysis). Is there anyone who you think would be a good fit to review?

You can also get started on planning this chapter's content in the doc.

rockeynebhwani commented 4 years ago

@rviscomi - I am happy to take role of content team lead but will need help with review and analysis part. I have messaged some folks in my network to see if anybody is interested.

rviscomi commented 4 years ago

Perfect, thanks!

jrharalson commented 4 years ago

Hi @rviscomi and @rockeynebhwan

Sorry for the delay. I’d like to help with authoring. I can help with analysis again (here or if needed in other chapters).

I’m in the process figuring out some updates to “wappalyzer” which hopefully will uncover more platforms that are not currently uncovered by the tool previously.  IE> IBM/HCL WebSphere Commerce, SAP/Hybris and bigCommerce.  The 2019 stats from wapp were pretty low in comparison.

Regards,

Jason Haralson jason.haralson@gmail.com On Jul 13, 2020, 1:16 PM -0400, rockeynebhwani notifications@github.com, wrote:

@rviscomi - I am happy to take role of content team lead but will need help with review and analysis part. I have message some folks in my network to see if anybody is interested. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

drewzboto commented 4 years ago

Hey folks. I'm interested in either being a reviewer or author, and I'll see if I can recruit a few more folks!

rviscomi commented 4 years ago

Thanks @jrharalson @drewzboto! Great to see this chapter filling out. I've added you both as reviewers. I'll leave it up to @rockeynebhwani to reassign as coauthors as needed.

@jrharalson I've also added you as an analyst. Several chapters are still in need of analysts so please go through that list and sign up for any that interest you. Your help is much appreciated!

rockeynebhwani commented 4 years ago

@jrharalson - Do you have a GitHub issue open for updates to wappalyzer? Let me know if you need identifying any eCommerce platforms. What I have realized that with Headless Commerce and Micro-services based platforms, it becomes more and more difficult to identify eCommerce platforms. I would like to see CommerceTools added to your list if you haven't added it already.

loewengart commented 4 years ago

@drewzboto posted on our slack channel, Happy to help as author or reviewer. If still looking for help as author or reviewer. Have experience w/ the eComm, CMS, and Imagery pieces.

g3john commented 4 years ago

@rockeynebhwani I would like to volunteer as a Dev to help with simple PRs!

rockeynebhwani commented 4 years ago

Thanks @g3john for taking my call to help with simple PRs. I will message you directly via slack

rviscomi commented 4 years ago

@rockeynebhwani are you referring to Wappalyzer ecommerce detection PRs?

rockeynebhwani commented 4 years ago

Yes @rviscomi .. I have filed few today

adityapandey1998 commented 4 years ago

@rockeynebhwani posted on our slack channel. I am a new dev but would like to help with the analysis. Let me know how to proceed with the same.

tunetheweb commented 4 years ago

Hey @adityapandey1998 welcome!

Start with the Analysts Guide and set up BigQuery (Good guide on that by our very own @paulcalvano who's leading the Analyst team here on the Web Almanac). Also be aware this can be expensive but there's a generous free tier and Paul will provide credits beyond that for Almanac work. There are also sample tables which are much cheaper to query and it should be difficult to go beyond the free budget with those. Then join the #web-almanac slack and Paul will invite you to the Analysts channel on that.

For this chapter, you can read last year's chapter, look at last year's SQL for this chapter (and the actual results it produced) - both of these are linked at the bottom of the chapter btw. Familiarise yourself with all this, then work with @rockeynebhwani and the reviewers to figure out what metrics you want to use this this year and then convert them into queries. Would suggest reusing a lot of last year's queries but also adding some to give a fresh take. Liaise with the other Analysts and @paulcalvano if you have any questions on the data set and what's available.

We're planning to run the crawl for the 2020 dataset throughout August so critical point is to quickly figure out and implement any custom metrics required for that crawl before it starts. Would hope there shouldn't be too many (if any) as there is quite a lot of detail in the current dataset and we didn't need any for the Ecommerce chapter last year.

Hope that helps and gives you something to get started on!

rockeynebhwani commented 4 years ago

@adityapandey1998 / @g3john - Do you have time to work on these PRs?

I managed to complete few PRs for Wappalyzer and for this one (You can refer this https://github.com/AliasIO/wappalyzer/pull/3227 .. you will have to make similar changes for above two issues), I have provided full instructions in both the issues .. I would like to use this information for eCommerce chapter also if possible

tunetheweb commented 4 years ago

Interesting thread here that might be worth exploring further in this chapter: https://twitter.com/igrigorik/status/1284539413003821057?s=21

rockeynebhwani commented 4 years ago

Thanks @bazzadp .. @igrigorik - Are you able to share your queries for Shopify analysis? Analysts on this chapter are new so your queries will be handy

igrigorik commented 4 years ago

Yep: https://gist.github.com/igrigorik/9345b70ad92f3e010162048e755377d1

The new renderer is, I believe, rolling out this month so it'll be interesting to see if and how the distribution changes.

rockeynebhwani commented 4 years ago

@jrharalson @drewzboto @loewengart

As discussed on Friday, it will be good to know how many eCommerce sites have app presence on App store of Play store. We can try to find out this with help of META tags and presence of app links found typically in website footer. This analysis can also highlight missed deep linking opportunities by site. So, this is my proposal.

So, Question we will answers are -

1) Analyse following META tags. Not sure why there are so many different ones.. will need some work to identify exact ones.. Some used by Facebook, can be found on this link - https://developers.facebook.com/docs/applinks/metadata-reference/

For Play Store

  <meta name="google-play-app" content="app-id=com.myntra.android" />
  <meta property="al:android:url" content="https://www.myntra.com/"  />
  <meta property="al:android:package" content="com.myntra.android" />
  <meta property="al:android:app_name" content="Myntra Fashion Shopping App" />

For App Store

  <meta property="al:ios:url" content="https://www.myntra.com/" />
  <meta property="al:ios:app_store_id" content="907394059" />
  <meta property="al:ios:app_name" content="Myntra Fashion Shopping App" />
  <meta name='apple-itunes-app' content="app-id=907394059, app-argument=https://www.myntra.com/" />

We already have this captured using WPT Custom metric currently.

2) In addition to above, also look for presence of links with following format -

a) App Store link - 'https://itunes.apple.com/*/app/' OR 'https://apps.apple.com/*/app/' (Example - https://www.kurtgeiger.com/sale/women) b) Play Store link - https://play.google.com/store/apps/details?id=*

Issues/Edge Cases - We are going to miss sites like https://www.marksandspencer.com/. M&S doesn't have META tags and also doesn't have links to app directly on HomePage. They have a separate link in footer called 'Download our apps'

Any other considerations OR feedback on this?

@g3john / @adityapandey1998 - Queries for these should be relatively straight forward if you guys want to give it a try.

rockeynebhwani commented 4 years ago

Here is the query.. (Query is still very generic based on desktop data and not limited to eCommerce for now but it's working)

CREATE TEMP FUNCTION hasAndroidAppMeta(payload STRING)
RETURNS BOOLEAN LANGUAGE js AS '''
try {
  var $ = JSON.parse(payload);
  var almanac = JSON.parse($._almanac);
  return !!almanac['meta-nodes'].find(meta => 'property' in meta && meta.property.toLowerCase() == 'al:android:package');
} catch (e) {
  return false;
}
''';

CREATE TEMP FUNCTION hasiOSAppMeta(payload STRING)
RETURNS BOOLEAN LANGUAGE js AS '''
try {
  var $ = JSON.parse(payload);
  var almanac = JSON.parse($._almanac);
  return !!almanac['meta-nodes'].find(meta => 'property' in meta && meta.property.toLowerCase() == 'al:ios:app_store_id');
} catch (e) {
  return false;
}
''';

CREATE TEMP FUNCTION hasiOSiPhoneMeta(payload STRING)
RETURNS BOOLEAN LANGUAGE js AS '''
try {
  var $ = JSON.parse(payload);
  var almanac = JSON.parse($._almanac);
  return !!almanac['meta-nodes'].find(meta => 'property' in meta && meta.property.toLowerCase() == 'al:iphone:app_store_id');
} catch (e) {
  return false;
}
''';

CREATE TEMP FUNCTION hasiOSiPadMeta(payload STRING)
RETURNS BOOLEAN LANGUAGE js AS '''
try {
  var $ = JSON.parse(payload);
  var almanac = JSON.parse($._almanac);
  return !!almanac['meta-nodes'].find(meta => 'property' in meta && meta.property.toLowerCase() == 'al:ipad:app_store_id');
} catch (e) {
  return false;
}
''';

CREATE TEMP FUNCTION hasiTunesAppMeta(payload STRING)
RETURNS BOOLEAN LANGUAGE js AS '''
try {
  var $ = JSON.parse(payload);
  var almanac = JSON.parse($._almanac);
  return !!almanac['meta-nodes'].find(meta => 'name' in meta && meta.name.toLowerCase() == 'apple-itunes-app');
} catch (e) {
  return false;
}
''';

SELECT
  COUNTIF(hasAndroidAppMeta(payload)) AS hasAndroidAppMeta,
  COUNTIF(hasiOSAppMeta(payload)) AS hasiOSAppMeta,
  COUNTIF(hasiOSiPhoneMeta(payload)) AS hasiPhoneAppMeta,
  COUNTIF(hasiOSiPadMeta(payload)) AS hasiPadAppMeta,
  COUNTIF(hasiTunesAppMeta(payload)) AS hasiTunesAppMeta
FROM
  `httparchive.pages.2020_06_01_desktop`

image

And this query is now specific to eCommerc sites. 1536 sites with at least one of these METAs. Number should increase post next release of Wappalyzer when we start to recognize eCommerce sits with 'Google Analytics Enhanced eCommerce'

CREATE TEMP FUNCTION hasAndroidAppMeta(payload STRING)
RETURNS BOOLEAN LANGUAGE js AS '''
try {
  var $ = JSON.parse(payload);
  var almanac = JSON.parse($._almanac);
  return !!almanac['meta-nodes'].find(meta => 'property' in meta && meta.property.toLowerCase() == 'al:android:package');
} catch (e) {
  return false;
}
''';

CREATE TEMP FUNCTION hasiOSAppMeta(payload STRING)
RETURNS BOOLEAN LANGUAGE js AS '''
try {
  var $ = JSON.parse(payload);
  var almanac = JSON.parse($._almanac);
  return !!almanac['meta-nodes'].find(meta => 'property' in meta && meta.property.toLowerCase() == 'al:ios:app_store_id');
} catch (e) {
  return false;
}
''';

CREATE TEMP FUNCTION hasiOSiPhoneMeta(payload STRING)
RETURNS BOOLEAN LANGUAGE js AS '''
try {
  var $ = JSON.parse(payload);
  var almanac = JSON.parse($._almanac);
  return !!almanac['meta-nodes'].find(meta => 'property' in meta && meta.property.toLowerCase() == 'al:iphone:app_store_id');
} catch (e) {
  return false;
}
''';

CREATE TEMP FUNCTION hasiOSiPadMeta(payload STRING)
RETURNS BOOLEAN LANGUAGE js AS '''
try {
  var $ = JSON.parse(payload);
  var almanac = JSON.parse($._almanac);
  return !!almanac['meta-nodes'].find(meta => 'property' in meta && meta.property.toLowerCase() == 'al:ipad:app_store_id');
} catch (e) {
  return false;
}
''';

CREATE TEMP FUNCTION hasiTunesAppMeta(payload STRING)
RETURNS BOOLEAN LANGUAGE js AS '''
try {
  var $ = JSON.parse(payload);
  var almanac = JSON.parse($._almanac);
  return !!almanac['meta-nodes'].find(meta => 'name' in meta && meta.name.toLowerCase() == 'apple-itunes-app');
} catch (e) {
  return false;
}
''';

SELECT
  p.URL,
  COUNTIF(hasAndroidAppMeta(payload)) AS hasAndroidAppMeta,
  COUNTIF(hasiOSAppMeta(payload)) AS hasiOSAppMeta,
  COUNTIF(hasiOSiPhoneMeta(payload)) AS hasiPhoneAppMeta,
  COUNTIF(hasiOSiPadMeta(payload)) AS hasiPadAppMeta,
  COUNTIF(hasiTunesAppMeta(payload)) AS hasiTunesAppMeta
FROM
  `httparchive.pages.2020_06_01_desktop` as p
INNER JOIN
  `httparchive.technologies.2020_06_01_desktop` AS t
ON
  t.url = p.url
and t.category = 'Ecommerce'
group by p.url
having (COUNTIF(hasAndroidAppMeta(payload)) = 1 OR COUNTIF(hasiOSAppMeta(payload)) = 1 OR COUNTIF(hasiOSiPhoneMeta(payload)) = 1 OR COUNTIF(hasiOSiPadMeta(payload)) = 1 OR COUNTIF(hasiTunesAppMeta(payload)) = 1)
rockeynebhwani commented 4 years ago

@drewzboto - Looking at Mobify, can we imply in Wappalyzer if a site is eCommerce? Are all clients of Mobify eCommerce clients? If yes, it can improve detection of eCommerce category for sites like - https://www.debenhams.com/

alankent commented 4 years ago

For other sites we have talked to the platform and asked if they would like to create the Wappalyzer rule.

rockeynebhwani commented 4 years ago

@alankent - @drewzboto works for Mobify.

Also, as Wappalyzer is open source.. any technology can be added by anybody.. isn't it ? I am not aware of any opt out mechanism but as a courtesy good to ask. Is this what you are implying ?

alankent commented 4 years ago

It is both courteous and potentially more reliable (and maintained when the platform changes). It also avoids situations where "brand X looks bad because person Y did a bad job of the sensing rule".

rockeynebhwani commented 4 years ago

Makes sense. Will take this approach if we can get hold of somebody in concerned org.

drewzboto commented 4 years ago

@drewzboto - Looking at Mobify, can we imply in Wappalyzer if a site is eCommerce? Are all clients of Mobify eCommerce clients? If yes, it can improve detection of eCommerce category for sites like - https://www.debenhams.com/

yes all our clients are ecommerce (in the broader sense, transactions happen online even for travel, telco and other sites). I can add a PR to detect both our x-powered-by header and a script/js method for our two different ways of implementation

foxdavidj commented 4 years ago

@jrharalson Took a look through the chapter and it looks like the Crawler should be setup to get most if not all of the data you need. Can you verify and let me know if you find any additional data you need tracked?

@rockeynebhwani if you decide to track which links point to an App store of some kind, we'd need to create a custom metric for that really soon. I'm working on putting new custom metrics (PR here) together right now so keep me posted

rockeynebhwani commented 4 years ago

@obto - Yes for app store links, I was thinking of custom metric but @rviscomi or @paulcalvano (can't remember who and I can't find thread) that it's not too difficult to query this without custom metric also. If you think we should custom metric, let's do that. It becomes easier to query with custom metric

foxdavidj commented 4 years ago

@rockeynebhwani @jrharalson for the two milestones overdue on July 27 could you check the boxes if:

Keeping the milestone checklist up to date helps us to see at a glance how all of the chapters are progressing. Thanks for helping us to stay on schedule!

remotesynth commented 4 years ago

@obto Sorry for any confusion on my end. I was only contributing to the Jamstack chapter afaik but this thread and links are all on ecommerce.

rviscomi commented 4 years ago

@remotesynth sorry about that, I've edited @obto's comment to clarify the correct analyst for this chapter.

foxdavidj commented 4 years ago

@jrharalson @drewzboto Can you request edit access to your chapter doc (if you haven't already), and add your name and email to the document?

foxdavidj commented 4 years ago

I've updated the chapter metadata at the top of this issue to link to the public spreadsheet that will be used for this chapter's query results. The sheet serves 3 purposes:

  1. Enable authors/reviewers to analyze the results for each metric without running the queries themselves
  2. Generate data visualizations to be embedded in the chapter
  3. Serve as a public audit trail of this chapter's data collection/analysis, linked from the chapter footer
foxdavidj commented 3 years ago

@rockeynebhwani in case you missed it, we've adjusted the milestones to push the launch date back from November 9 to December 9. This gives all chapters exactly 7 weeks from now to wrap up the analysis, write a draft, get it reviewed, and submit it for publication. So the next milestone will be to complete the first draft by November 12.

However if you're still on schedule to be done by the original November 9 launch date we want you to know that this change doesn't mean your hard work was wasted, and that you'll get the privilege of being part of our "Early Access" launch.

Please see the link above for more info and reach out to @rviscomi or me if you have any questions or concerns about the timeline. We hope this change gives you a bit more breathing room to finish the chapter comfortably and we're excited to see it go live!

rviscomi commented 3 years ago

@rockeynebhwani any update on the status of the first draft?

rviscomi commented 3 years ago

This chapter is far behind and in danger of not being ready by launch. :(

alankent commented 3 years ago

Hi guys! I just wanted to jump on this thread and offer any assistance I can, e.g. for reviewing content. Feel free to reach out if I can help get this chapter moving along!

rviscomi commented 3 years ago

Thank you @alankent! I've updated the metadata in the top comment to add you to the list of reviewers along with @drewzboto. I've also updated the coauthors to include @rockeynebhwani and @jrharalson to reflect the doc. Please request edit access to ensure that you can leave comments and to add your name and email in the doc.

@rockeynebhwani @jrharalson let's chat about getting this chapter back on track to be released later this month.

rockeynebhwani commented 3 years ago

Thanks @alankent for offering review. I have set aside time for next 2 weeks and I will be in touch soon.

alankent commented 3 years ago

Happy New Year! I am back on deck if I can help in any way - even if just nagging! Politely of course!! ;-)

rockeynebhwani commented 3 years ago

Hi @alankent ,

Apologies for the delay here. @barrypollard will be publishing the chapter later today but you can see the final draft here - https://20210117t105608-dot-webalmanac.uk.r.appspot.com/en/2020/ecommerce

Feel free to message me if you have any additional thoughts and comments.

Cheers, Rockey