flarum / framework

Simple forum software for building great communities.
http://flarum.org/
6.37k stars 835 forks source link

SEO and Accessibility Deal Breakers #1820

Closed meezaan closed 3 years ago

meezaan commented 5 years ago

Bug Report

Current Behavior There are several issues to mention here, all related. The SEO impact of these is massive, particularly if you are going to be switching a well indexed site to Flarum.

It's interesting that all the content comes back without JavaScript when loading a single discussion thread. When you load with JavaScript and scroll down to load more results, the URL also changes to reflect paging. It would appear there is some inconsistency with the way the thread / discussion, tag/tags and and all discussion pages are built.

Steps to Reproduce Just load the pages as mentioned above.

Expected Behaviour One would expect all the content (and by this I mean semantic markup with content from the database) to be the same, with our without JavaScript - the display, of course, may not be. Every time we click on a Load More button, the URL should paginate (otherwise the search engine bot would consider the button / link to be useless).

Screenshots All Discussions Page without JS Tags Page without JS

You can see the output of the above 2 pages without JavaScript here: https://1drv.ms/u/s!AmrVmjAdc7P2wJQvsRFe8wCUI13zRw?e=zd37Am

Environment

Output of "php flarum info", run this in terminal in your Flarum directory.

Possible Solution

Additional Context We cannot go live with Flarum until the 2 pages render semantically correct HTML with the content.

matteocontrini commented 5 years ago

Some random notes:

For what it's worth, my view is that there's no real reason to worry about the nojs experience, it doesn't seem to matter much in the end.

meezaan commented 5 years ago

Thanks for the feedback @matteocontrini - but as you rightly call it random, it doesn't really apply to the specific problems pointed out here. JSON-LD does not apply here and the extensions don't actually fix the semantic shortcomings.

For SEO, the identified issues with missing content and markup are a must.

Accessibility issues caused by JavaScript - these are a a legal requirement for some even if they are not for all of us.

mikejones3 commented 5 years ago

@meezaan What legal requirement are you referring too?

meezaan commented 5 years ago

@mikejones3 https://www.jisc.ac.uk/accessibility, for instance.

dsevillamartin commented 5 years ago

@meezaan Could you clarify what you mean with point flarum/framework#1 ?

As the first post's content is only shown for Sticky discussions when JavaScript is enabled, I'm not really sure what you are asking here.

meezaan commented 5 years ago

@datitisev Thank you for the work on this thus far.

That's precisely it - it doesn't bring those back at all, whether they show up because of sticky discussions or the summary extension. They should at least appear in the DOM whether JS is enabled or not.

dsevillamartin commented 5 years ago

@meezaan I see. That's an interesting idea. Not sure how we want to tackle that, if we do.

meezaan commented 5 years ago

JS should only really impact our styling or User Interface, but not the content that comes back in the DOM on the first page load - it's the same concept when viewing a single discussion.

clarkwinkelmann commented 5 years ago

Loading the All Discussions page with JavaScript (as Google bot would see it, for instance) does not bring back The discussion titles and a post in the discussion as one would expect

@meezaan in your first point, do you mean without javascript ? Or is there an issue when the single page app is loaded inside google bot with javascript ?

The reason there's a difference between the javascript and nojs version of the page is because the Sticky extension doesn't extend both. Post previews are only implemented for javascript. We might want to add nojs support to the Sticky extension but I don't think Flarum offers any extensibility inside of the nojs page @datitisev

This is the nojs view: https://github.com/flarum/core/blob/master/views/frontend/content/index.blade.php it has no way of being extended in a clean way. We might want to improve that.

meezaan commented 5 years ago

@clarkwinkelmann Yes, I meant without, sorry. Presumably we could add an if statement somewhere around https://github.com/flarum/core/blob/master/views/frontend/content/index.blade.php#L12 to check for the post preview?

clarkwinkelmann commented 5 years ago

It's a bit more complicated than that. Core is not aware previews are a thing. Previews are implemented by the Sticky core extension (and also by @jordanjay29 's summaries extension), so this is where the logic should go, once we make it possible for the nojs view to be extended.

dsevillamartin commented 5 years ago

​1 doesn't seem to fit in core, at least not now. While more content per page does increase SEO standings, having hidden elements and loading the first post of each discussion in the discussion list probably isn't a very good idea to add to core. The rest of the issues have had PRs made to fix them.

meezaan commented 5 years ago

@datitisev @clarkwinkelmann OK, so 1 we can do without. I can now see all the post titles and URLs without the JavaScript, so technically it is use-able and accessible, great work.

However, when browsing all discussions or a tag, we still do not get pagination appearing in the URL. These are 2 and 4 in the list. If there is a PR for these issues, can you link it to this issue, please.

Thanks.

clarkwinkelmann commented 5 years ago

I'm trying to get the full picture of how it currently works, please correct me where needed:

Search engines without javascript support browse our nojs discussion list. They have access to pagination, and index each page, and each discussion. If users click a link from the search results that include ?page=x, they see the correct content, but pagination is broken (next page will always be number 2 and no way to go one page back)

Search engine with javascript support don't click the "load more" because it's not a link (it's a button), so they only ever index the first page.

If this is correct, then the issue is not only about adding pagination in the url and using it on initial page load, we probably also need to update the index page to show the real pagination link to browsers, using the rel=next meta and/or making "load more" being a link instead of just a button, otherwise I guess only users will see the pagination in the url and search engines wouldn't.

I suppose javascript-enabled crawlers don't browse the single page app in a single session ? They probably load the page, run javascript, then inspect the page, then load it again from scratch to crawl another page ?

This is relevant https://webmasters.googleblog.com/2014/02/infinite-scroll-search-friendly.html

I think the problem is mitigated by using the sitemap extension. the goal is ultimately to get all discussions indexed, not really to get each result page indexed.

While testing what happens when users follow a paginated link from a search engine, I realized the following UI issue. If you visit https://discuss.flarum.org/?page=2, then click load more, you get the exact same content loaded a second time. When the index is loaded with ?page=, the pre-loaded content will be the one asked in the url, but the UI always think page one is loaded.

It's obviously related to the SEO issue. If we implement linkable pagination on the index page, it will also solve this UI issue.

meezaan commented 5 years ago

@clarkwinkelmann Thank you. I think you've captured most of it but SEO is also dependent on links to the website from third parties. Currently, I don't have a way to link the 3rd page for a particular tag, along with the correct rel=next meta tag and a canonical url tag we would need to allow changing the URL via load more to enable that kind of linking too. I think the rest here makes sense.

clarkwinkelmann commented 5 years ago

Is there a rationale for deep-linking to paginated discussion results from third-party websites ? The linked content won't stay the same.

Same for search engines, it's great if they can use the pagination to discover content, but by the time a user clicks the paginated link, it's possible what they are looking was already pushed multiple pages away. So the sitemap makes more sense to me than letting search engines browsing paginated results.

The only useful reason I see for user-visible pagination in the url, is if you refresh the page so you end up at about the same place.

dsevillamartin commented 5 years ago

@meezaan That'd be flarum/framework#1829 which adds pts 2 & 4.

meezaan commented 5 years ago

@datitisev Thank you. @clarkwinkelmann You're right - paginated tags or All Discussions is not useful as it would change (the default sort order is last_updated_at), but a single paginated discussion is useful. In the latter case, if you want to reference a particular page in a large discussion thread with relevant posts, you would need paginated URLs.

luceos commented 5 years ago

So what's the status on this, are all items except no. 1 tackled?

meezaan commented 5 years ago

1 we can do without. In theory, we can close this issue.

We should probably track stuff similar to what is mentioned in https://github.com/flarum/core/issues/1820#issuecomment-520911064 somewhere.

selfthinker commented 4 years ago

Just to clarify the potential accessibility issues...

If something doesn't work without JavasScript that's not strictly speaking an accessibility issue. It would definitely not fall under the regulations that @meezaan linked to as it's not covered by WCAG on which the regulations are based on. It does not affect someone with an impairment more than it would someone without an impairment. You could make an argument, though, that people with impairments often have older technology. And having older technology has an impact on JavaScript.

Something not working without JavaScript is more about resilience, robustness and usability. People usually talk about noJS like it's only affecting people who intentionally switch JavaScript off. But that is not the case. That is only the minority.

In 2013 GDS found that 1.1% of visits were not getting JS, but only 0.2% of those because they have JS disabled or their browser doesn't support it. 0.9% don't have JS for other reasons. (Some people say these numbers would be much lower nowadays. My personal theory is that they would be much higher. But we don't have any newer numbers, unfortunately.)

That blog post wrongly says it's 1.1% of all users, but it was 1.1% of all visits. Over time it can affect 100% of users. See what that distinction means and why it matters.

If people don't switch JS off and their browser supports it, there are lots of other reasons for not getting the JS.

So, there are lots of good reasons for making the noJS experience better, but accessibility is not really one of them.

meezaan commented 4 years ago

@selfthinker I think the intent here was, if I may summarize, that content should be visible with or without JS, and should be navigable (one way or another) for someone who does not load with JS.

selfthinker commented 4 years ago

@meezaan, I'm confused. Was there anything in my comment that made you think I was talking about something else?

meezaan commented 4 years ago

@selfthinker Not at all - much like the way I had initially stated, there is a lot of information - but what we are aiming for is just one line amongst the many and can get lost - so stating exactly what we are after always helps.

askvortsov1 commented 3 years ago

Splitting into https://github.com/flarum/core/issues/2537, https://github.com/flarum/core/issues/2173