PrestaShop / docs

PrestaShop technical documentation
https://devdocs.prestashop-project.org/
Other
122 stars 479 forks source link

Algolia DocSearch fine-tuning #1052

Closed matks closed 1 year ago

matks commented 3 years ago

We now run the amazing search provided by https://docsearch.algolia.com/ !

We can improve the search results, here is a todo-list

Sources

https://docsearch.algolia.com/docs/tips

https://www.algolia.com/blog/engineering/how-to-build-a-helpful-search-for-technical-documentation-the-laravel-example/

https://docsearch.algolia.com/docs/required-configuration

matks commented 3 years ago

Source of our configuration https://github.com/algolia/docsearch-configs/blob/master/configs/prestashop.json (if we aim to improve it) => we can contribute to repository

matks commented 3 years ago

Additionnal documentation:

matks commented 3 years ago

@eternoendless did you work on this ?

eternoendless commented 3 years ago

Search results are filtered by version since this PR https://github.com/PrestaShop/ps-docs-theme/pull/5

kpodemski commented 2 years ago

Right now, there is a massive problem with the indexation of different versions of the docs. Here are the results:

For instance, these are the results for Console in v1.7: https://capture.dropbox.com/XEthfVFiNbVsrOno results are acceptable.

Results for v8: https://capture.dropbox.com/vw4QjHWhichgKj0y

which is not relevant for the user.

This problem is also visible in the Algolia administration center as we can see indexation for v8 is much lower than for 1.7: https://capture.dropbox.com/aEoLpz0rrq5wtoSk

eternoendless commented 2 years ago

I don't understand why, we're following their documentation and pages are correctly tagged 🤔

MeKeyCool commented 2 years ago

I take this one ^^

MeKeyCool commented 2 years ago

Right now, there is a massive problem with the indexation of different versions of the docs. Here are the results:

For instance, these are the results for Console in v1.7: https://capture.dropbox.com/XEthfVFiNbVsrOno results are acceptable.

Results for v8: https://capture.dropbox.com/vw4QjHWhichgKj0y

which is not relevant for the user.

This problem is also visible in the Algolia administration center as we can see indexation for v8 is much lower than for 1.7: https://capture.dropbox.com/aEoLpz0rrq5wtoSk

Taking most used research from Algolia analytics, I found that they are all broken in v8 search. Even for documentations that didn't move/change.

search count nbHits totalPercent Check v8 doc search
hook 592 257 0.024% NOk (satisfying in v1.7)
hooks 287 193 0.012% NOk (satisfying in v1.7)
form 226 1015 0.009% NOk (satisfying in v1.7)
override 217 93 0.009% NOk (satisfying in v1.7)
ajax 206 27 0.008% NOk (satisfying in v1.7)
product 167 394 0.007% NOk (satisfying in v1.7)
grid 133 182 0.005% NOk (satisfying in v1.7)
order 132 332 0.005% NOk (satisfying in v1.7)
controller 111 228 0.004% NOk (satisfying in v1.7)
cart 111 440 0.004% NOk (satisfying in v1.7)
cache 109 44 0.004% NOk (satisfying in v1.7)
mail 106 269 0.004% NOk (satisfying in v1.7)
smarty 97 81 0.004% NOk (satisfying in v1.7)
cron 93 956 0.003% NOk (satisfying in v1.7)

I'll check configuration for v8 scoped search.

MeKeyCool commented 2 years ago

It seems the crawler is stopped :

Too many missing records The new index generated by this crawl is missing too many records to replace the production index automatically

SafeReindexingError: [prestashop] Blocking error:
   The difference between the number of records:
   from : 12.2k
   to   : 0
   is too large (100 %), this limit can be modified in the Editor (currently 10 %)
matks commented 2 years ago

Thank you @MeKeyCool that is worrying news 😱

MeKeyCool commented 2 years ago

I think it is from hostname devdocs.prestashop-project.org update. I'll check to make a PR as soon as possible.

kpodemski commented 2 years ago

@MeKeyCool

there's also a problem with heading prioritization, take a look at this page: https://devdocs.prestashop-project.org/1.7/modules/concepts/hooks/list-of-hooks/#full-list

the page with "list of hooks" in h1 should haver a higher priority if you search for list of hooks

MeKeyCool commented 2 years ago

@MeKeyCool

there's also a problem with heading prioritization, take a look at this page: https://devdocs.prestashop-project.org/1.7/modules/concepts/hooks/list-of-hooks/#full-list

the page with "list of hooks" in h1 should haver a higher priority if you search for list of hooks

:+1: I'll add this to my testing process. If someone can update issue description, it would be good to store all "testing" criteria in a complete and synthetic description ^^

MeKeyCool commented 2 years ago

I sent an email to Algolia to update domain, @kpodemski is linked so I hope he will be informed. As I won't be able to follow this subject anymore, I recommend to change owner.

:point_up: Please notice that once the domain will be allowed by admin, you will need to update crawler configuration https://crawler.algolia.com/admin/crawlers/0b7a25f0-3983-498e-8d7b-38e003a8184d/configuration/edit

MeKeyCool commented 2 years ago

Algolia answered to me and I updated crawler's configuration.

It works but it looks it didn't solve our problem. Studying a bit more crawler results, it looks that around half of URLs are ignored : https://crawler.algolia.com/admin/crawlers/0b7a25f0-3983-498e-8d7b-38e003a8184d/monitoring/summary

It had time to check one of them and it is said "https://devdocs.prestashop-project.org/8/basics/installation/configuration/ ... Skipped in favor of canonical URL: https://devdocs.prestashop-project.org/1.7/basics/installation/configuration/"

As @kpodemski suggested, it is probably a conflict between 1.7 and 8 versions inside Algolia

kpodemski commented 2 years ago

Thank you, @MeKeyCool, for your work around this subject! :)

Yes, that's what I thought. Thanks to you, we know it is because of the canonical URLs. Of course, having a canonical URL makes sense. It is coming from the @eternoendless PR: https://github.com/PrestaShop/ps-docs-theme/commit/7f24eee5d1eb25819ad7987f0feff04ff0513fd1

I've submitted a change to index pages with the canonical URL: https://github.com/PrestaShop/ps-docs-theme/pull/13

kpodemski commented 2 years ago

Update:

It was required to change the Crawler settings and set a parameter ignoreCanonicalTo to true

now indexation is working as expected

There's still some work to do to improve the quality of the results, but having some results in general for v8 is a good starting point 😅

Thanks again @MeKeyCool, for fixing the crawler, which helped us find a final solution