LycheeOrg / Lychee-front

JS implementation of Lychee frontend
https://lycheeorg.github.io/
MIT License
48 stars 53 forks source link

[Enhancement] SEO optimization 3/3 - Don't use fragments for client-side navigation, but proper URLs #343

Closed nagmat84 closed 9 months ago

nagmat84 commented 1 year ago

Currently, our web frontend is not very SEO friendly. It is entirely written in JS, which is fine for modern web crawlers, but it violates some best practices.

Enhancement: Currently the frontend uses fragments (i.e. the #-part of the URL) to implement client-side navigation. While client-side navigation is generally fine, it must not use fragments. Fragments are not only a problem for Twitter (although this is Twitter's fault), but search engine won't detect the different "pages" as being different and do not index fragments. Instead, the frontend should use proper URLs and the JS history API.

See: https://developers.google.com/search/docs/crawling-indexing/javascript/javascript-seo-basics#use-history-api

kamil4 commented 1 year ago

The first two I understand (I think), but this one not really. So what does it affect in case of Lychee? For what it's worth, we do use the history.pushState (in lychee.goto) so I guess we are using the JS history API at least?

According to that Google page, the crawler only considers the <a> tags. But, at least for things like albums and photos, we don't use them at all -- we attach an onclick event handler to the <div>. So I guess we would need to create some <a> tags first?

nagmat84 commented 1 year ago

The point is that a search engine only considers two pages to be distinct and indexes them individually, if the path component of their URLs differ. The fragment is not taken into a account. (See Syntax Image on Wikipedia for path and fragment.) Fragments are intended to be used to navigate to different anchors (e.g. different scroll positions) on the same page.

Lychee is violating that. Currently we are using the fragment part for navigation between different "pages". I guess we are using fragments, because it appeared to be easier as one can manipulate the window.location.href directly and has not to take extra care to suppress an undesired page reload.

But this means from the perspective of a web crawler that everything is on one page. If the web crawler revisits and re-indexes the page later and uses a different fragment part for that and hence sees a complete different thing, the previous index result will be overwritten.

nagmat84 commented 1 year ago

According to that Google page, the crawler only considers the <a> tags. But, at least for things like albums and photos, we don't use them at all -- we attach an onclick event handler to the <div>. So I guess we would need to create some <a> tags first?

Thanks for pointing that out. I have already added some comments to the source code that we misusing <a>. It is even worse: We are not only not using them where we should, but we are also using them to show icons (e.g. the badges on a photo) which are not clickable and do not constitute a link.

nagmat84 commented 1 year ago

The problematic <a> tags have their own issue now: https://github.com/LycheeOrg/Lychee-front/issues/344

kamil4 commented 1 year ago

I understand the reason for the search engine behavior when it comes to fragments, but what about queries (?)? Is it really only the path that matters?

nagmat84 commented 1 year ago

Good question. I haven't looked into every detail thoroughly. I don't feel like doing so right now. Hence, in the following O only express my assumptions.

From a theoretical perspective, the query part should matter. I mean, the query part is sent to the server (in contrast to the fragment) and the server may base its response on the query part. IMHO, it would be a bug if a web crawler would only consider the path.

However, I would not bet on it. Given the fact that some web proxies and web caches failed to get that piece right some years ago, I would not try to push my luck too far.