SSHOC / sshoc-marketplace-frontend

SSHOC marketplace frontend
https://marketplace.sshopencloud.eu
Apache License 2.0
0 stars 0 forks source link

improve seo #26

Open dpancic opened 4 years ago

dpancic commented 4 years ago

In GitLab by @KlausIllmayer on Jun 22, 2020, 18:02

As it is alpha release we don't like to have the SSHOC MP showing up in search engines. Therefore a robots.txt should be put into the production frontend that disallows every bot. Blocking will be lifted likely with Beta release.

dpancic commented 4 years ago

In GitLab by @stefanprobst on Jun 23, 2020, 09:29

fixed by d7a19a95

dpancic commented 4 years ago

In GitLab by @stefanprobst on Jun 23, 2020, 09:29

closed

dpancic commented 3 years ago

In GitLab by @KlausIllmayer on May 20, 2021, 15:45

We found out yesterday that the development instance is visible in Google Search results. Google gives the information, that disallowing search engines in the robots.txt does not prevent showing the site url in the Google search results (but it will not show a description of the site), instead it would be necessary to turn off the robots.txt disallow rules and use noindex either in HTTP header or as a meta-tag (found this information: https://developers.google.com/search/docs/advanced/crawling/block-indexing). I'm not sure if other search engines follow the same rules.

Asking @vronk @laureD19 @vronk @stefanprobst if we should apply this Google rule or if we leave it as it is. I opt for leaving as it is.

dpancic commented 3 years ago

In GitLab by @stefanprobst on May 20, 2021, 15:51

do you have a screenshot or example search?

dpancic commented 3 years ago

In GitLab by @vronk on May 20, 2021, 15:52

Hm, interesting. But if the actual content is not indexed, just the site url, then ok.

Speaking of which, the robots.txt on our production server at https://marketplace.sshopencloud.eu/robots.txt has the same Disallow rule – I expect this will stay the same while we're in Beta, right? (We just shouldn't forget to change that rule in the final. 😊)

dpancic commented 3 years ago

In GitLab by @KlausIllmayer on May 20, 2021, 15:53

not so easy without exposing the url to the public via gitlab ;) if i search in google for "sshoc marketplace" it shows me on the second page the development version (could be different depending on your search history - but try a private window)

dpancic commented 3 years ago

In GitLab by @stefanprobst on May 20, 2021, 15:55

for me this is the 15th match (in a private window):

Screenshot_20210520_155453

dpancic commented 3 years ago

In GitLab by @stefanprobst on May 20, 2021, 15:58

also fyi: we do define a sitemap in robots.txt which always points to the prod instance, but shouldn't be indexed because of the disallow rule.

i don't think it's a huge deal to have the url in the results - especially since this will rank lower as soon as we have the final release live with proper canonical urls.

@vronk yes, the plan was to remove the disallow rule on final release.

KlausIllmayer commented 10 months ago

We liked to have a robots.txt for the final release (see the last comment) but it seems, that it does not exist: https://marketplace.sshopencloud.eu/robots.txt gives a 404. @stefanprobst can we integrate a robots.txt for the production instance?

KlausIllmayer commented 10 months ago

Had a talk with Stefan: it is not so much about the robots.txt as the main motivation behind this is a better SEO (search engine optimization) result. And we thought, that a robots.txt may help for this. But it seems we need to dig deeper into this. The current state is very disappointing. Looking in google for all results from the marketplace (enter in the search bar of google site:marketplace.sshopencloud.eu and search) only returns 112 results. Static websites seems to be indexed but all dynamic content (= items) is either missing or quite old (from 2020). It is unclear, why we are in this state. Interestingly, it is quite the same for duckduckgo and bing (but bing having a little bit more results: 636 results)

@laureD19 Stefan proposed to hand over the current connection to googles webmaster analysis tool to a DARIAH-Account. Do you know, if there is such an account? We also may need to look a little bit deeper in SEO to understand, why marketplace is so bad covered.

dietervu commented 10 months ago

Section 3.2 of https://arxiv.org/ftp/arxiv/papers/1706/1706.05089.pdf might be useful. Google's indexing strategy however often remains a black box.

On 2023-08-21 16:44, KlausIllmayer wrote:

Had a talk with Stefan: it is not so much about the robots.txt as the main motivation behind this is a better SEO (search engine optimization) result. And we thought, that a robots.txt may help for this. But it seems we need to dig deeper into this. The current state is very disappointing. Looking in google for all results from the marketplace (enter in the search bar of google |site:marketplace.sshopencloud.eu| and search) only returns 112 results. Static websites seems to be indexed but all dynamic content (= items) is either missing or quite old (from 2020). It is unclear, why we are in this state. Interestingly, it is quite the same for duckduckgo and bing (but bing having a little bit more results: 636 results)

@laureD19 https://github.com/laureD19 Stefan proposed to hand over the current connection to googles webmaster analysis tool to a DARIAH-Account. Do you know, if there is such an account? We also may need to look a little bit deeper in SEO to understand, why marketplace is so bad covered.

— Reply to this email directly, view it on GitHub https://github.com/SSHOC/sshoc-marketplace-frontend/issues/26#issuecomment-1686469298, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB5TCDYUSIKUUI37Y6SBOB3XWNX5FANCNFSM6AAAAAA3INTG34. You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Dieter Van Uytvanck Technical Director CLARIN ERIC www.clarin.eu | tel. +31-(0)850091363 | skype: dietervu.mpi

KlausIllmayer commented 10 months ago

Thanks for the pointer! Indeed we observe something similar, and yes, we may need to invest into creating a sitemap.

vronk commented 8 months ago

move the registration of SSHOMP in the google search console to dariah-eric.eu (managed by Arnaud and Matej), currently registered by Stefan.

in the next step: generate a dynamic sitemap and feed it to google.