HTTPArchive / almanac.httparchive.org

HTTP Archive's annual "State of the Web" report made by the web community
https://almanac.httparchive.org
Apache License 2.0
609 stars 167 forks source link

SEO 2020 #908

Closed foxdavidj closed 3 years ago

foxdavidj commented 4 years ago

Part II Chapter 7: SEO

Content team

Authors Reviewers Analysts Draft Queries Results
@aleyda @ipullrank @fellowhuman1101 @clarkeclark @natedame @catalinred @aysunakarsu @ashleyish @dsottimano @dwsmart @en3r0 @Gathea @rachellcostello @ibnesayeed @max-ostapenko @Tiggerito @antoineeripret Doc *.sql Sheet

Content team lead: @aleyda

Welcome chapter contributors! You'll be using this issue throughout the chapter lifecycle to coordinate on the content planning, analysis, and writing stages.

The content team is made up of the following contributors:

New contributors: If you're interested in joining the content team for this chapter, just leave a comment below and the content team lead will loop you in.

Note: To ensure that you get notifications when tagged, you must be "watching" this repository.

Milestones

0. Form the content team

1. Plan content

2. Gather data

3. Validate results

4. Draft content

5. Publication

aleyda commented 4 years ago

Sounds great :) Count me in!

foxdavidj commented 4 years ago

I'd like to nominate Nate Dame as well.

Really looking forward to how this chapter turns out with how much enthusiasm there is for it 😄

max-ostapenko commented 4 years ago

I'd like to participate as an analyst in this chapter.

clarkeclark commented 4 years ago

I'm happy to be a reviewer again this year for the SEO chapter.

natedame commented 4 years ago

I'd love to join! Since this would be my first year I'd like to be a reviewer if possible.

catalinred commented 4 years ago

I'd love to help in any way.

ashleyish commented 4 years ago

I'd still like to help - but, there are quite a few participants.

I'd like to gently recommend that folks make room for anyone that is underrepresented or new to putting themselves out there.

Though it'd be good to find at least a balance in gender, I'm good to give my spot up to anyone that is new to the scene. Or act as mentor. <3

dwsmart commented 4 years ago

Still happy to help as I was when @AVGP kindly nominated over on the other thread, but hyper conscious I think the team should be a diverse one, so happy to bow out to help that happen

rviscomi commented 4 years ago

Thank you @ashleyish and @dwsmart! I love to see it.

@obto and I will be reaching out to our picks for the content team lead for each chapter, and once that person is confirmed for SEO they can choose their coauthor(s) as needed. SEO is an especially complex topic so I would expect this to have ~3 coauthors like last year. And for anyone else still interested in contributing, there is no limit on the number of technical reviewers so we'd welcome your help!

@catalinred were you interested in contributing as a reviewer or analyst?

catalinred commented 4 years ago

@catalinred were you interested in contributing as a reviewer or analyst?

@rviscomi I don't have the right skills for analyst so I was thinking about contributing as a reviewer maybe. But I am happy to take a step back and looking forward to seeing the content team lead pick.

rviscomi commented 4 years ago

Ok I've added you as a reviewer for now and you can reevaluate as needed.

aysunakarsu commented 4 years ago

I will be happy to contribute.

Tiggerito commented 4 years ago

I'd be happy to help. I work in technical SEO and I've made a few BigQuery/archive queries to extract data from pages. Then ran out of money! Example.

covid

dsottimano commented 4 years ago

Happy to contribute as per the nomination on the main post but equally happy to act as a mentor as @ashleyish suggested and find a few new faces.

foxdavidj commented 4 years ago

@aleyda thank you for agreeing to be the lead author for the SEO chapter! As the lead, you'll be responsible for driving the content planning and writing phases in collaboration with your content team, which will consist of yourself as lead, any coauthors you choose as needed, peer reviewers, and data analysts.

The immediate next steps for this chapter are:

  1. Establish the rest of your content team. Several other people were interested or nominated (see below), so that's a great place to start. The larger the scope of the chapter, the more people you'll want to have on board.
  2. Start sketching out ideas in your draft doc.
  3. Catch up on last year's chapter and the project methodology to get a sense for what's possible.

There's a ton of info in the top comment, so check that out and feel free to ping myself or @rviscomi with any questions!

@en3r0 @ipullrank we'd still love to have you contribute as a peer reviewer or coauthor as needed. Let us know if you're still interested!

@aysunakarsu @ashleyish @dsottimano @dwsmart @natedame I've put you down as reviewers for now, and will leave it to @aleyda to reassign at their discretion

foxdavidj commented 4 years ago

@Tiggerito would you like to contribute as an analyst for the chapter?

Tiggerito commented 4 years ago

@Tiggerito would you like to contribute as an analyst for the chapter?

Happy to do that.

aysunakarsu commented 4 years ago

I can contribute as an analyst too if there is still need at that part. Thanks.

antoineeripret commented 4 years ago

I'd be happy to help as an analyst. If there is enough analysts for this chapter, happy to help with another one as well :)

aleyda commented 4 years ago

Thank you @obto ! Excited to be able to contribute as the lead author :) I'll catch up going through last year edition again and the methodology used.

I definitely think that due to the scope of the chapter it would be great to have a couple of more authors and a couple of more analysts, like this we can have 3 authors (as last year), 3-4 reviewers, 3-4 analysts.

@aysunakarsu @ashleyish @dsottimano @dwsmart @natedame Hello everybody! Very looking forward to contribute together :) I see some of you had expressed interest on contributing as analysts instead of reviewers... please let me know below if you would still like to contribute as analysts or if you prefer the writing side of things, the possibility to contribute as authors instead too.

@ipullrank Would love to have you as a co-author!

ipullrank commented 4 years ago

@aleyda I'm in!

en3r0 commented 4 years ago

@aleyda happy to contribute as a peer reviewer or coauthor as needed!

Gathea commented 4 years ago

I would be happy to to be a reviewer if possible ! It’s the first year for me

rviscomi commented 4 years ago

@aleyda I've sent you an invite to join the 2020 Authors team on GitHub. Can you visit https://github.com/HTTPArchive/ to accept the invite? This will ensure you get notifications about important chapter milestones.

aleyda commented 4 years ago

@rviscomi thank you! I just accepted :)

foxdavidj commented 4 years ago

Hey @aleyda, just checking in:

  1. How is the the chapter coming along? We're tying to have the outline and metrics settled on by the end of the week so we have time to configure the Web Crawler to track everything you need.
  2. Can you remind your team to properly add and credit themselves in your chapter's Google Doc?
  3. Anything you need from me to keep things moving forward?
aleyda commented 4 years ago

Thanks for the follow-up @obto ! I'll make sure the outline are completed by the end of the week :) The metrics are also expected this week or by July 27? Can you please confirm to coordinate/prioritize accordingly?

Also, I just requested access to the document. Could you please give access to the rest of the team or do you prefer that I give it to them? Just let me know the best way to proceed.

@ipullrank Let's start with the outline :) I'll try to leave a first draft ready today, max tomorrow - I'll be following up in the following hours!

@aysunakarsu @ashleyish @dsottimano @dwsmart @natedame Hi everybody, this is a reminder that I'm waiting for your answer on my question above: In case you want to be an analyst instead of a reviewer, so you can also be added as such in the Google Docs too... and we can start with the initial tasks :) Looking forward to collaborate together!

At this moment:

@clarkeclark @natedame @catalinred @aysunakarsu @ashleyish @dsottimano @dwsmart @en3r0 @Gathea - are reviewers. @max-ostapenko @Tiggerito are analysts.

As next steps:

Thanks again :)

AVGP commented 4 years ago

I think @fellowhuman1101 would also be up to help here! 🙌

aleyda commented 4 years ago

Amazing! Thanks @AVGP :) @fellowhuman1101 I look forward to your confirmation! Do you want to be a co-author, reviewer, analyst? Just let me know! Would love to have you too! ❤️

Tiggerito commented 4 years ago

Hello @AVGP

I was actually thinking about your angle of things. You know we can run JavaScript to get data, so do you have any insights on things to test?

Canonicals/robots meta changed via JS?

One idea I had was to see how much Structured Data was being added/changed/removed via JavaScript.

This is currently based on a theory that I can create a query to compare raw html with rendered. Joining two 20TB datasets may break the bank!

Downside to all this is, I found out we can only analyse home pages :-(

dwsmart commented 4 years ago

Hey @aleyda, I'm happy with the reviewer role, I'm not massively familiar with the dataset, but more than happy to jump over to that side if there's a shortage and I can be helpful.

foxdavidj commented 4 years ago

@aleyda

The metrics are also expected this week?

Yes we need the list of metrics you'd like us to research by the end of this week. This is because we need time to configure the Web Crawler which starts on August 1st, and give your analysts time to look over the metrics and make any necessary adjustments together with you

Could you please give access to the rest of the team or do you prefer that I give it to them?

If you have the emails of everyone on your team, please do invite them! Otherwise, please do have them request access themselves :)

aleyda commented 4 years ago

Thanks for your confirmation @dwsmart , no problem at all :) it will be awesome to have you as a reviewer!

@obto thanks for the clarification, got it :) I'll make sure to have it end of week.

fellowhuman1101 commented 4 years ago

@aleyda reporting for duty!

natedame commented 4 years ago

@aleyda also reporting (again?) for duty! :) In response to your question, I don't think my skillset would make a good analyst. Happy to review when the time comes!

rachellcostello commented 4 years ago

So I'm incredibly late to the 2020 Web Almanac party and am just catching up on everything (don't link GitHub to your work email and then change jobs...) but it looks like you've pulled together an excellent team for this year's SEO chapter!!

I've offered to be an editor this time round, but as one of last year's SEO chapter co-authors, if there's anything I can do to help then let me know @aleyda :)

Looks like you've got a bunch of people on board already though, so if I'm not needed then I can't wait to read it when it's finished!

aleyda commented 4 years ago

Amazing @fellowhuman1101 - added you as a co-author!

@natedame - perfect :) You stay then as a reviewer!

@rachellcostello - Amazing, it's great to have you Rachel! Adding you as a reviewer then :)

Tiggerito commented 4 years ago

I've been trying to get my head around the data available to us. This is my summary:

The data source is based on the mobile and desktop home pages for over 5 million domains sourced from Crux. The data is acquired monthly and includes:

Historical monthly data is also available.

Edit: Chrome-UX data is also available including all the Web Vitals

aleyda commented 4 years ago

Thanks for starting verifying the available data to validate also the viable metrics @Tiggerito, this is great !

I have a question regarding the rendered content info: Would it be possible to verify for example the content relying on client side JS to be rendered? I have been revising what was included in last year chapter to define this year outline (in progress here), and I don't see that information included, but it would be great to have!

Tiggerito commented 4 years ago

Hi @aleyda,

I'm not sure when the almanac.js script is run. If it's late in the rendering process we should be able to pick up client side rendered info. I'll ask around.

aleyda commented 4 years ago

@Tiggerito Thank you! :)

foxdavidj commented 4 years ago

@Tiggerito Sounds like you've got a pretty good grasp on things! @bazzadp just made a post with some extra information you might find helpful as well https://github.com/HTTPArchive/almanac.httparchive.org/issues/914#issuecomment-659205330

And don't forget to join the #web-almanac slack so @paulcalvano can invite you to the Analysts channel where you can ask any questions you may have.

Edit: Looks like you've already joined! :tada:

Tiggerito commented 4 years ago

I've done some reverse engineering of all last years queries and made some basic notes.

SQL

10_01 - pulls almanac.js for structured data, counts if contains a type from the Google Gallery. Reports percent containing split by device. list probably needs updating 10_02 - reports on lang values used from body only pulls first 2 characters 10_03 - pulls almanac.js link tags for amphtml and reports percent. e.g. percent of home pages using amp 10_04a - hreflang use percent by device 10_04b - popular hreflang values by device 10_05 - pulls almanac.js to report schema types used by device 10_06 - lighthouse mobile data for is crawlable and is canonical. ** is crawlable also seems to check robots meta tag noindex! 10_07a - lighthouse mobile data on title and meta description presence 10_07b - title lengths based on percentiles by device 10, 25, 50, 75, 90 * good example of how to do quantiles/percentiles 10_07c - meta description lengths based on percentiles by device. data aquired from almanac.js 10_08 - status codes by percent 10_09a - words and heading word counts by percentile from almanac.js 10_09b - lighthouse mobile image alt percent score of 1 (all are set) 10_10 - percentiles for external, internal and anchor links from almanac.js 10_11 - SPAs ('React', 'Angular', 'Vue.js') using navigational hash links from almanac.js *** example of a JOIN with the technologies table 10_12 - lighthouse mobile robots.txt with no validation errors 10_13 - % of desktop pages that include a stylesheet with a breakpoint under 600px. usses parsed_css table 10_14 - lighthouse mobile data percent link-text score is 1. It means no failing links which are links using block words like "more info" 10_15a - % of websites classified as fast/avg/slow from the chrome-ux-report - has our Web Vital Scores :-) 10_15b - % of websites classified as fast/avg/slow by device (form factor) 10_16 - h1 length by percentile and device 10_17 - percent https by device 10_18 - percent without headers or even words, by device 10_19 - no external, external, hash links, by device

almanac.js

looks for json-ld and microdata. Returns an object listing all types found. Goes 5 levels in for json-ld count of links: external, internal (same hostname), hash/navigateHash/earlyHash (same page with #, navigate if no jump to anchor, early = first 2) h1 to h4 count and total word count word count from all text nodes in body meta tags link tags

Tiggerito commented 4 years ago

Hi @aleyda, I've just done some testing. The JavaScript we can run is processed via webpagetest.org and they state the script is run after the normal test has finished. The tool does a long test which make me think we have quite a reliably rendered DOM to work with.

On my own test it picked up Structured Data added via JavaScript as well as iframes added by Disqus.

I think we need to get things added to the script by the end of the month so the extra data is extracted when they crawl.

rockeynebhwani commented 4 years ago

Hi @aleyda @Tiggerito

For eCommerce chapter 2020, I was considering to pull a custom metric to find out how many social channels sites are using on average and publishing this information via Schema.org. I was planning to do this using a custom metric (Thanks to @savsav)

[numberOfSocialChannels]
let seoScripts = Array.from(document.querySelectorAll('script[type = "application/ld+json"]'))
    .map(script => JSON.parse(script.innerText))
    .filter(obj => obj.hasOwnProperty("sameAs"));
if (seoScripts.length == 0) {
    return;
}
return seoScripts[0].sameAs.length;

Example usage from https://direct.asda.com/george/clothing/10,default,sc.html

<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "Organization",

"name" : "GeorgeAtAsda",
"url": "https://direct.asda.com",
"logo": "https://direct.asda.com/on/demandware.static/-/Library-Sites-ASDAShared/default/dw4998f70c/logos/grg_header_logo_comp.jpg",
"contactPoint" : [{
"@type" : "ContactPoint",
"telephone" : "+44-800-952-0101",
"contactType" : "customer service"
}],
"sameAs" : [
"https://www.facebook.com/GeorgeatAsdaOfficial",
"https://uk.pinterest.com/georgeatasda/",
"https://twitter.com/Georgeatasda",
"https://instagram.com/georgeatasda",
"https://www.youtube.com/user/georgedressforless"
]
}

Here is a test - https://www.webpagetest.org/result/200717_3J_d1226f7d51175b862b866aa12f2b85ca/1/details/#waterfall_view_step1

But I think it will be better to add this as generic information in SEO chapter also and may be I can reference eCommerce specific stats in eCommerce chapter. What do you think?

We may need to tweak the code to cover few more use cases.

@rviscomi @jrharalson @drewzboto @loewengart

aleyda commented 4 years ago

Hi @max-ostapenko :) I wanted to get in touch since we're already leaving the outline ready and start specifying the expected metrics in the chapter's Google Docs here, @Tiggerito is already in there checking what we're adding and we're assessing with them already the viability of the metrics for the sections we're adding, and it would be great if you can also participate in the process as an analyst too. What's the best email to add you as a Google Doc editor too? Just let me know to add you so you can also participate in the process.

Thanks!

aleyda commented 4 years ago

Hi @rockeynebhwani, Thanks for getting in touch!

We're adding a Structured Data sub-section in the chapter under the content area, the goal so far is to specify:

To confirm: You're particularly interested to provide the information of social presence information via the Organization Schema right? If so I don't think there should be an issue digging further to obtain the usage of specific properties ... it would be great if @Tiggerito or @max-ostapenko could confirm.

On the other hand, from a scope perspective, it would be to see with @fellowhuman1101 and @ipullrank how far we want to go regarding providing info on specific properties used across the different Schema types ... as there are so many of them. Last year in the SEO chapter Structured Data section here, the information provided was the information about the most used structured data types, highlighting those that have a more prominent impact in search results. So on that regard the "social presence" specification might not necessarily be one of the top ones... but let's see what they think too :)

rockeynebhwani commented 4 years ago

@aleyda - Thanks for your thoughts. I believe Google uses this section to build knowledge graph. You can see in below screenshot and hence I thought it will be relevant for SEO chapter and this can easily be gathered from HomePage

image

rockeynebhwani commented 4 years ago

@aleyda - Another interesting structured data which we can surface and it's used by Google to show search box next to result.. (For this, we will have to look for potentialAction with type 'SearchAction'.. See example below)

image

image

Not sure adoption of this on certain platform like Shopfiy etc. If platforms have this as out of the box feature, we will possibly see higher adoption ..

aleyda commented 4 years ago

Hi @rockeynebhwani, in the past that structured data information regarding the social presence was taken into consideration for the knowledge panel of a business, but not anymore -at least not directly-, and Google updated their guidelines/specification for it, now it takes into consideration the logo but for the social presence the process is for the business to claim their presence in the Knowledge Panel and updating the information, as specified here: https://developers.google.com/search/docs/guides/enhance-site

Then about the "search action" yes, that's is something that was added last year and I expect we continue adding too, due to its visibility/impact.

So on one hand: we will definitely make sure to add general structured data information and then on the other, highlight further those that trigger prominent search features that from our experience has a higher impact: FAQ, How-Tos, Reviews, etc. although we will need to see up to which point again we want to specify every single one... but I also think we won't know for sure until we write that section and agree/validate between authors though.. I think that maybe the best way to move forward here is that we make sure to collect all the structured data information as much as it is available, and then, once we write this SD section for SEO you see what we're highlighting further based on their search importance, and if you think you would like to expand/dig further something that could be potentially more important from an ecommerce perspective that we're not covering as much due to our different scope, you can include it in the ecommerce chapter? I'm trying to see the best way to coordinate, leave it as much open at this still very early stage where we are and then also, that we don't lack the data in case you need it (and we don't).

Thanks again :)