Closed foxdavidj closed 3 years ago
From @jrharalson:
I'd really like to help (co-authoring, reviewing, and/or extracting data) with the ECommerce chapter this year to highlight several critical platforms that were not "discovered" by the wappalyzer tool in 2019. IBM/HCL, SAP/Hybris, and SalesForce Commerce Cloud platforms are significant players in the Ecommerce platform space that I'd like to figure out to uncover via wapp + maybe Crux/HTTPArchive or some other combination.
@alankent @samdutton is there anyone you'd like to nominate who would be a good author for this year's chapter?
Sorry for delay. I would like to nominate @philwinkle.
Thanks @alankent!
@philwinkle, for context on the project, the role, and the timeline, see the resources in the top comment of this issue. Let us know if you'd be interested in contributing!
@rviscomi - I am happy to be a co-author or reviewer for this chapter.
Great thanks @rockeynebhwani! Would you be interested in the content team lead role? You'd be responsible for keeping this chapter on schedule and coordinating with the other contributors. As of now nobody else has signed up, so to make this work we'll need at least another reviewer and analyst (unless you can also do the analysis). Is there anyone who you think would be a good fit to review?
You can also get started on planning this chapter's content in the doc.
@rviscomi - I am happy to take role of content team lead but will need help with review and analysis part. I have messaged some folks in my network to see if anybody is interested.
Perfect, thanks!
Hi @rviscomi and @rockeynebhwan
Sorry for the delay. I’d like to help with authoring. I can help with analysis again (here or if needed in other chapters).
I’m in the process figuring out some updates to “wappalyzer” which hopefully will uncover more platforms that are not currently uncovered by the tool previously. IE> IBM/HCL WebSphere Commerce, SAP/Hybris and bigCommerce. The 2019 stats from wapp were pretty low in comparison.
Regards,
Jason Haralson jason.haralson@gmail.com On Jul 13, 2020, 1:16 PM -0400, rockeynebhwani notifications@github.com, wrote:
@rviscomi - I am happy to take role of content team lead but will need help with review and analysis part. I have message some folks in my network to see if anybody is interested. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.
Hey folks. I'm interested in either being a reviewer or author, and I'll see if I can recruit a few more folks!
Thanks @jrharalson @drewzboto! Great to see this chapter filling out. I've added you both as reviewers. I'll leave it up to @rockeynebhwani to reassign as coauthors as needed.
@jrharalson I've also added you as an analyst. Several chapters are still in need of analysts so please go through that list and sign up for any that interest you. Your help is much appreciated!
@jrharalson - Do you have a GitHub issue open for updates to wappalyzer? Let me know if you need identifying any eCommerce platforms. What I have realized that with Headless Commerce and Micro-services based platforms, it becomes more and more difficult to identify eCommerce platforms. I would like to see CommerceTools added to your list if you haven't added it already.
@drewzboto posted on our slack channel, Happy to help as author or reviewer. If still looking for help as author or reviewer. Have experience w/ the eComm, CMS, and Imagery pieces.
@rockeynebhwani I would like to volunteer as a Dev to help with simple PRs!
Thanks @g3john for taking my call to help with simple PRs. I will message you directly via slack
@rockeynebhwani are you referring to Wappalyzer ecommerce detection PRs?
Yes @rviscomi .. I have filed few today
@rockeynebhwani posted on our slack channel. I am a new dev but would like to help with the analysis. Let me know how to proceed with the same.
Hey @adityapandey1998 welcome!
Start with the Analysts Guide and set up BigQuery (Good guide on that by our very own @paulcalvano who's leading the Analyst team here on the Web Almanac). Also be aware this can be expensive but there's a generous free tier and Paul will provide credits beyond that for Almanac work. There are also sample tables which are much cheaper to query and it should be difficult to go beyond the free budget with those. Then join the #web-almanac
slack and Paul will invite you to the Analysts channel on that.
For this chapter, you can read last year's chapter, look at last year's SQL for this chapter (and the actual results it produced) - both of these are linked at the bottom of the chapter btw. Familiarise yourself with all this, then work with @rockeynebhwani and the reviewers to figure out what metrics you want to use this this year and then convert them into queries. Would suggest reusing a lot of last year's queries but also adding some to give a fresh take. Liaise with the other Analysts and @paulcalvano if you have any questions on the data set and what's available.
We're planning to run the crawl for the 2020 dataset throughout August so critical point is to quickly figure out and implement any custom metrics required for that crawl before it starts. Would hope there shouldn't be too many (if any) as there is quite a lot of detail in the current dataset and we didn't need any for the Ecommerce chapter last year.
Hope that helps and gives you something to get started on!
@adityapandey1998 / @g3john - Do you have time to work on these PRs?
I managed to complete few PRs for Wappalyzer and for this one (You can refer this https://github.com/AliasIO/wappalyzer/pull/3227 .. you will have to make similar changes for above two issues), I have provided full instructions in both the issues .. I would like to use this information for eCommerce chapter also if possible
Interesting thread here that might be worth exploring further in this chapter: https://twitter.com/igrigorik/status/1284539413003821057?s=21
Thanks @bazzadp .. @igrigorik - Are you able to share your queries for Shopify analysis? Analysts on this chapter are new so your queries will be handy
Yep: https://gist.github.com/igrigorik/9345b70ad92f3e010162048e755377d1
The new renderer is, I believe, rolling out this month so it'll be interesting to see if and how the distribution changes.
@jrharalson @drewzboto @loewengart
As discussed on Friday, it will be good to know how many eCommerce sites have app presence on App store of Play store. We can try to find out this with help of META tags and presence of app links found typically in website footer. This analysis can also highlight missed deep linking opportunities by site. So, this is my proposal.
So, Question we will answers are -
1) Analyse following META tags. Not sure why there are so many different ones.. will need some work to identify exact ones.. Some used by Facebook, can be found on this link - https://developers.facebook.com/docs/applinks/metadata-reference/
For Play Store
<meta name="google-play-app" content="app-id=com.myntra.android" />
<meta property="al:android:url" content="https://www.myntra.com/" />
<meta property="al:android:package" content="com.myntra.android" />
<meta property="al:android:app_name" content="Myntra Fashion Shopping App" />
For App Store
<meta property="al:ios:url" content="https://www.myntra.com/" />
<meta property="al:ios:app_store_id" content="907394059" />
<meta property="al:ios:app_name" content="Myntra Fashion Shopping App" />
<meta name='apple-itunes-app' content="app-id=907394059, app-argument=https://www.myntra.com/" />
We already have this captured using WPT Custom metric currently.
2) In addition to above, also look for presence of links with following format -
a) App Store link - 'https://itunes.apple.com/*/app/' OR 'https://apps.apple.com/*/app/' (Example - https://www.kurtgeiger.com/sale/women) b) Play Store link - https://play.google.com/store/apps/details?id=*
Issues/Edge Cases - We are going to miss sites like https://www.marksandspencer.com/. M&S doesn't have META tags and also doesn't have links to app directly on HomePage. They have a separate link in footer called 'Download our apps'
Any other considerations OR feedback on this?
@g3john / @adityapandey1998 - Queries for these should be relatively straight forward if you guys want to give it a try.
Here is the query.. (Query is still very generic based on desktop data and not limited to eCommerce for now but it's working)
CREATE TEMP FUNCTION hasAndroidAppMeta(payload STRING)
RETURNS BOOLEAN LANGUAGE js AS '''
try {
var $ = JSON.parse(payload);
var almanac = JSON.parse($._almanac);
return !!almanac['meta-nodes'].find(meta => 'property' in meta && meta.property.toLowerCase() == 'al:android:package');
} catch (e) {
return false;
}
''';
CREATE TEMP FUNCTION hasiOSAppMeta(payload STRING)
RETURNS BOOLEAN LANGUAGE js AS '''
try {
var $ = JSON.parse(payload);
var almanac = JSON.parse($._almanac);
return !!almanac['meta-nodes'].find(meta => 'property' in meta && meta.property.toLowerCase() == 'al:ios:app_store_id');
} catch (e) {
return false;
}
''';
CREATE TEMP FUNCTION hasiOSiPhoneMeta(payload STRING)
RETURNS BOOLEAN LANGUAGE js AS '''
try {
var $ = JSON.parse(payload);
var almanac = JSON.parse($._almanac);
return !!almanac['meta-nodes'].find(meta => 'property' in meta && meta.property.toLowerCase() == 'al:iphone:app_store_id');
} catch (e) {
return false;
}
''';
CREATE TEMP FUNCTION hasiOSiPadMeta(payload STRING)
RETURNS BOOLEAN LANGUAGE js AS '''
try {
var $ = JSON.parse(payload);
var almanac = JSON.parse($._almanac);
return !!almanac['meta-nodes'].find(meta => 'property' in meta && meta.property.toLowerCase() == 'al:ipad:app_store_id');
} catch (e) {
return false;
}
''';
CREATE TEMP FUNCTION hasiTunesAppMeta(payload STRING)
RETURNS BOOLEAN LANGUAGE js AS '''
try {
var $ = JSON.parse(payload);
var almanac = JSON.parse($._almanac);
return !!almanac['meta-nodes'].find(meta => 'name' in meta && meta.name.toLowerCase() == 'apple-itunes-app');
} catch (e) {
return false;
}
''';
SELECT
COUNTIF(hasAndroidAppMeta(payload)) AS hasAndroidAppMeta,
COUNTIF(hasiOSAppMeta(payload)) AS hasiOSAppMeta,
COUNTIF(hasiOSiPhoneMeta(payload)) AS hasiPhoneAppMeta,
COUNTIF(hasiOSiPadMeta(payload)) AS hasiPadAppMeta,
COUNTIF(hasiTunesAppMeta(payload)) AS hasiTunesAppMeta
FROM
`httparchive.pages.2020_06_01_desktop`
And this query is now specific to eCommerc sites. 1536 sites with at least one of these METAs. Number should increase post next release of Wappalyzer when we start to recognize eCommerce sits with 'Google Analytics Enhanced eCommerce'
CREATE TEMP FUNCTION hasAndroidAppMeta(payload STRING)
RETURNS BOOLEAN LANGUAGE js AS '''
try {
var $ = JSON.parse(payload);
var almanac = JSON.parse($._almanac);
return !!almanac['meta-nodes'].find(meta => 'property' in meta && meta.property.toLowerCase() == 'al:android:package');
} catch (e) {
return false;
}
''';
CREATE TEMP FUNCTION hasiOSAppMeta(payload STRING)
RETURNS BOOLEAN LANGUAGE js AS '''
try {
var $ = JSON.parse(payload);
var almanac = JSON.parse($._almanac);
return !!almanac['meta-nodes'].find(meta => 'property' in meta && meta.property.toLowerCase() == 'al:ios:app_store_id');
} catch (e) {
return false;
}
''';
CREATE TEMP FUNCTION hasiOSiPhoneMeta(payload STRING)
RETURNS BOOLEAN LANGUAGE js AS '''
try {
var $ = JSON.parse(payload);
var almanac = JSON.parse($._almanac);
return !!almanac['meta-nodes'].find(meta => 'property' in meta && meta.property.toLowerCase() == 'al:iphone:app_store_id');
} catch (e) {
return false;
}
''';
CREATE TEMP FUNCTION hasiOSiPadMeta(payload STRING)
RETURNS BOOLEAN LANGUAGE js AS '''
try {
var $ = JSON.parse(payload);
var almanac = JSON.parse($._almanac);
return !!almanac['meta-nodes'].find(meta => 'property' in meta && meta.property.toLowerCase() == 'al:ipad:app_store_id');
} catch (e) {
return false;
}
''';
CREATE TEMP FUNCTION hasiTunesAppMeta(payload STRING)
RETURNS BOOLEAN LANGUAGE js AS '''
try {
var $ = JSON.parse(payload);
var almanac = JSON.parse($._almanac);
return !!almanac['meta-nodes'].find(meta => 'name' in meta && meta.name.toLowerCase() == 'apple-itunes-app');
} catch (e) {
return false;
}
''';
SELECT
p.URL,
COUNTIF(hasAndroidAppMeta(payload)) AS hasAndroidAppMeta,
COUNTIF(hasiOSAppMeta(payload)) AS hasiOSAppMeta,
COUNTIF(hasiOSiPhoneMeta(payload)) AS hasiPhoneAppMeta,
COUNTIF(hasiOSiPadMeta(payload)) AS hasiPadAppMeta,
COUNTIF(hasiTunesAppMeta(payload)) AS hasiTunesAppMeta
FROM
`httparchive.pages.2020_06_01_desktop` as p
INNER JOIN
`httparchive.technologies.2020_06_01_desktop` AS t
ON
t.url = p.url
and t.category = 'Ecommerce'
group by p.url
having (COUNTIF(hasAndroidAppMeta(payload)) = 1 OR COUNTIF(hasiOSAppMeta(payload)) = 1 OR COUNTIF(hasiOSiPhoneMeta(payload)) = 1 OR COUNTIF(hasiOSiPadMeta(payload)) = 1 OR COUNTIF(hasiTunesAppMeta(payload)) = 1)
@drewzboto - Looking at Mobify, can we imply in Wappalyzer if a site is eCommerce? Are all clients of Mobify eCommerce clients? If yes, it can improve detection of eCommerce category for sites like - https://www.debenhams.com/
For other sites we have talked to the platform and asked if they would like to create the Wappalyzer rule.
@alankent - @drewzboto works for Mobify.
Also, as Wappalyzer is open source.. any technology can be added by anybody.. isn't it ? I am not aware of any opt out mechanism but as a courtesy good to ask. Is this what you are implying ?
It is both courteous and potentially more reliable (and maintained when the platform changes). It also avoids situations where "brand X looks bad because person Y did a bad job of the sensing rule".
Makes sense. Will take this approach if we can get hold of somebody in concerned org.
@drewzboto - Looking at Mobify, can we imply in Wappalyzer if a site is eCommerce? Are all clients of Mobify eCommerce clients? If yes, it can improve detection of eCommerce category for sites like - https://www.debenhams.com/
yes all our clients are ecommerce (in the broader sense, transactions happen online even for travel, telco and other sites). I can add a PR to detect both our x-powered-by header and a script/js method for our two different ways of implementation
@jrharalson Took a look through the chapter and it looks like the Crawler should be setup to get most if not all of the data you need. Can you verify and let me know if you find any additional data you need tracked?
@rockeynebhwani if you decide to track which links point to an App store of some kind, we'd need to create a custom metric for that really soon. I'm working on putting new custom metrics (PR here) together right now so keep me posted
@obto - Yes for app store links, I was thinking of custom metric but @rviscomi or @paulcalvano (can't remember who and I can't find thread) that it's not too difficult to query this without custom metric also. If you think we should custom metric, let's do that. It becomes easier to query with custom metric
@rockeynebhwani @jrharalson for the two milestones overdue on July 27 could you check the boxes if:
Keeping the milestone checklist up to date helps us to see at a glance how all of the chapters are progressing. Thanks for helping us to stay on schedule!
@obto Sorry for any confusion on my end. I was only contributing to the Jamstack chapter afaik but this thread and links are all on ecommerce.
@remotesynth sorry about that, I've edited @obto's comment to clarify the correct analyst for this chapter.
@jrharalson @drewzboto Can you request edit access to your chapter doc (if you haven't already), and add your name and email to the document?
I've updated the chapter metadata at the top of this issue to link to the public spreadsheet that will be used for this chapter's query results. The sheet serves 3 purposes:
@rockeynebhwani in case you missed it, we've adjusted the milestones to push the launch date back from November 9 to December 9. This gives all chapters exactly 7 weeks from now to wrap up the analysis, write a draft, get it reviewed, and submit it for publication. So the next milestone will be to complete the first draft by November 12.
However if you're still on schedule to be done by the original November 9 launch date we want you to know that this change doesn't mean your hard work was wasted, and that you'll get the privilege of being part of our "Early Access" launch.
Please see the link above for more info and reach out to @rviscomi or me if you have any questions or concerns about the timeline. We hope this change gives you a bit more breathing room to finish the chapter comfortably and we're excited to see it go live!
@rockeynebhwani any update on the status of the first draft?
This chapter is far behind and in danger of not being ready by launch. :(
Hi guys! I just wanted to jump on this thread and offer any assistance I can, e.g. for reviewing content. Feel free to reach out if I can help get this chapter moving along!
Thank you @alankent! I've updated the metadata in the top comment to add you to the list of reviewers along with @drewzboto. I've also updated the coauthors to include @rockeynebhwani and @jrharalson to reflect the doc. Please request edit access to ensure that you can leave comments and to add your name and email in the doc.
@rockeynebhwani @jrharalson let's chat about getting this chapter back on track to be released later this month.
Thanks @alankent for offering review. I have set aside time for next 2 weeks and I will be in touch soon.
Happy New Year! I am back on deck if I can help in any way - even if just nagging! Politely of course!! ;-)
Hi @alankent ,
Apologies for the delay here. @barrypollard will be publishing the chapter later today but you can see the final draft here - https://20210117t105608-dot-webalmanac.uk.r.appspot.com/en/2020/ecommerce
Feel free to message me if you have any additional thoughts and comments.
Cheers, Rockey
Part III Chapter 16: Ecommerce
Content team
Content team lead: @rockeynebhwani
Welcome chapter contributors! You'll be using this issue throughout the chapter lifecycle to coordinate on the content planning, analysis, and writing stages.
The content team is made up of the following contributors:
New contributors: If you're interested in joining the content team for this chapter, just leave a comment below and the content team lead will loop you in.
Note: To ensure that you get notifications when tagged, you must be "watching" this repository.
Milestones
0. Form the content team
1. Plan content
2. Gather data
3. Validate results
4. Draft content
5. Publication