giellalt / giellalt.github.io

GiellaLT documentation
https://giellalt.github.io
3 stars 3 forks source link

Search does not work as it should #8

Open trondtynnol opened 2 years ago

trondtynnol commented 2 years ago

When searching the documentation using the Google search bar, I often do not find anything, although what I am looking for exists in the documentation.

For example searching for environment variables site:https://giellalt.github.io/ returns nothing. This works searching the old site. Using the same example environment variables site:https://giellalt.uit.no/ yields the results I was looking for.

snomos commented 2 years ago

I am aware. The GH Pages theme we are using should be preconfigured for this, and I have checked that it is properly set up according to advice, still it does not improve search results.

I checked a.o. this: https://github.blog/2016-05-10-better-discoverability-for-github-pages-sites/ (and its links).

There are probably still things we can do: https://stackoverflow.com/questions/63720160/github-page-can-only-be-found-on-google-when-typing-username-and-github

All help welcome 😄

trondtynnol commented 2 years ago

I will look into it a little :)

It seems DuckDuckGo does index the site correctly, so we might consider switching from Google to DDG if we cannot get Google to index it. The best solution would of course be for all search engines to index the site.

snomos commented 2 years ago

Here are some more tips on how to improve and optimize search engine performance: https://backlinko.com/hub/seo/sitemaps

It seems sitemaps are core to help indexing the pages, and we should probably automatize the process of updating it.

snomos commented 2 years ago

See also https://developers.google.com/search/docs/advanced/sitemaps/overview and follow the link at the bottom

trondtynnol commented 2 years ago

Yes, I agree we should build a sitemap automatically, as it probably will improve search results somewhat.

However, it does seem that Google is using a very long time to actually index anything even though the sitemap is submitted. Almost two months have passed since I added the simple txt sitemap and still only 153 pages of the around 640 listed are indexed on Google.

trondtynnol commented 2 years ago

I guess this plugin should do the trick: https://github.com/jekyll/jekyll-sitemap

snomos commented 2 years ago

That is one option. When I checked the Google Search Console, one thing that stood out was the lack of entries for sub-site documents: many files in keyboard-XXX/docs/* and lang-XXX/docs/* were not indexed because they never appeared in the sitemap (72 pages were not indexed, partly because of this). The easiest would be to create sitemaps for all of these separately as part of the build process.

How did you create the html sitemap file in the rood directory?

snomos commented 2 years ago

[Eg byter til norsk - foreign readers: use Google Translate for the remainder of the issue if you want to follow 🙂 )

Her er eit døme frå den mest frekvente feiltypen:

Skjermbilde 2022-04-01 kl  16 23 47

Slik eg forstår feilmeldinga så påstår Google at det ikkje finst andre sider som peiker til denne sida. Eg er litt overraska i dette tilfellet, men det kan nok stemma for mange sider - i det gamle Forrest-systemet så fanst det ein meny til venstre som vart halde ved like uavhengig av side-interne lenker, og det finst heilt sikkert ein del sider som det berre har vorte lenka til derifrå. Dei blir dermed utan lenke etter at vi flytta til GH/Markdown.

snomos commented 2 years ago

Bortsett frå at det ikkje stemmer:

grep -r HowToAddANewLanguage * 
AboutGiellaLT.md:[a ready-made setup](infra/infraremake/HowToAddANewLanguage.html) for adding more languages.
infra/infraremake/HowToMoveALanguageFromTheOldInfraToTheNew.md:* create [a new language directory](HowToAddANewLanguage.html)
infra/TechnicalMaintenance.md:* [How to add a new language to the infrastructure](infraremake/HowToAddANewLanguage.html)
sitemap.txt:https://giellalt.github.io/infra/infraremake/HowToAddANewLanguage.html
trondtynnol commented 2 years ago

Jamt over verker det som at Google slit med å kravle gjennom sida, og eg skjønar ikkje heilt kvifor.

Eg laga den sitemap.txt-fila litt raskt for å teste om det kunne hjelpe, så om eg hugsar rett brukte eg ein variant av ls -R og så filtrerte eg ut nokre ting og la til url-en fyrst i linjene. Då kom nok sikkert ikkje genererte sider frå andre repositoriar med.

snomos commented 2 years ago

Då kom nok sikkert ikkje genererte sider frå andre repositoriar med.

Nei, sikkert ikkje, og det treng dei heller ikkje bli. Dei bør få eigne sitemap-filer, som blir autogenerert under bygginga. Då vil sitemap-fila alltid vera oppdatert