facebook / docusaurus

Easy to maintain open source documentation websites.
https://docusaurus.io
MIT License
56.66k stars 8.51k forks source link

Algolia includes result from different locales but redirecting shows 404 #5880

Open code-masala opened 3 years ago

code-masala commented 3 years ago

Have you read the Contributing Guidelines on issues?

Prerequisites

Description

https://domain.com/ur/hello if this route is given by algolia search When i redirect then it will 404.When I refresh the page with same route It works

Steps to reproduce

Step1-Open algolia search bar Step2-Write something for search Step3-click

Expected behavior

When I click on search data I have to redirect to the respective pages

Actual behavior

Give 404 first . If I manually refresh then It works fine

Your environment

Reproducible demo

No response

Self-service

Josh-Cena commented 3 years ago

Can't reproduce. Can you try reproducing this on the Docusaurus site? https://docusaurus.io/

I tried with https://docusaurus.io/zh-CN/ and Algolia worked correctly. It could be a problem with the page itself instead of Algolia—is it a doc page, or a custom page?

code-masala commented 3 years ago

{ "index_name": "sample", "start_urls": ["https://domain.com/"], "sitemap_urls": ["https://domain.com/sitemap.xml"], "sitemap_alternate_links": true, "stop_urls": ["/tests"], "selectors": { "lvl0": { "selector": "(//ul[contains(@class,'menu__list')]//a[contains(@class, 'menu__link menu__link--sublist menu__link--active')]/text() | //nav[contains(@class, 'navbar')]//a[contains(@class, 'navbar__link--active')]/text())[last()]", "type": "xpath", "global": true, "default_value": "Documentation" }, "lvl1": "header h1", "lvl2": "article h2", "lvl3": "article h3", "lvl4": "article h4", "lvl5": "article h5, article td:first-child", "lvl6": "article h6", "text": "article p, article li, article td:last-child" }, "strip_chars": " .,;:#", "custom_settings": { "separatorsToIndex": "_", "attributesForFaceting": ["language", "version", "type", "docusaurus_tag"], "attributesToRetrieve": [ "hierarchy", "content", "anchor", "url", "url_without_anchor", "type" ] }, "conversation_id": ["833762294"], "nb_hits": 46250 }

code-masala commented 3 years ago

I want to search based on language selected like in https://docusaurus.io/

Josh-Cena commented 3 years ago

@code-masala That query payload is far from enough for me to figure out what's wrong. Do you have a reproducible demo? A published site?

code-masala commented 3 years ago

@Josh-Cena what i want if the selected language is french(fr) then only french data from algolia is fetched

Josh-Cena commented 3 years ago

image

This should already be the case

code-masala commented 3 years ago

@Josh-Cena can you help me to figure out that what change I have to do in config.json as this thing is not explained in internet as I am try.

Josh-Cena commented 3 years ago

As I said, I can't help you much without having a site to look at. I've never observed the behavior you described.

slorber commented 3 years ago

Give 404 first . If I manually refresh then It works fine

It's not clear what you mean here, please show at least a screenshot

@code-masala we can't really help on this without inspecting your real live site URL and your algolia index config.

We'll re-open once it's provided.

casionone commented 2 years ago

I also have the same problem . website:https://linkis.staged.apache.org/ when the first time to do serach ,it will be 404 , refresh in the chrome then work The url is https://linkis.staged.apache.org/zh-CN/community/how-to-contribute/#12-%E5%8A%9F%E8%83%BD%E4%BA%A4%E6%B5%81%E5%AE%9E%E7%8E%B0%E9%87%8D%E6%9E%84

image

Josh-Cena commented 2 years ago

Test

@casionone Weirdly, I can't reproduce it at all...

casionone commented 2 years ago

https://user-images.githubusercontent.com/7869972/149863979-0b0ee295-b26b-4bf7-b204-f56302d87c7b.mp4

@Josh-Cena

Josh-Cena commented 2 years ago

Thanks. This is because Algolia uses an SPA redirect but the en locale and zh-Hans locale are two different SPAs.

slorber commented 2 years ago

@casionone the issue is that your English site is presenting Chinese search results.

This also seems to be the original problem:

what i want if the selected language is french(fr) then only french data from algolia is fetched


We can see in the network tab that your site is not sending any facetFilter:

image

You should enable contextualSearch: true, this will include relevant "filters" sent in search queries, including filtering on the current language.

https://docusaurus.io/docs/search#contextual-search


It's a good time to enable this feature by default, will do that in https://github.com/facebook/docusaurus/pull/6407

slorber commented 2 years ago

However, we still need to fix some i18n edge cases, because the "Recent" search results are shared across languages, and we can still get a 404 in some situations (like clicking on a recent Chinese search result while on the English site).

I can reproduce this on the Docusaurus prod site, and contextualSearch won't fix it.

slorber commented 2 years ago

Re-opening because there are still edge cases to fix, see my comment above

sergeyol commented 1 year ago

Hi @slorber, @Josh-Cena, I've have the same problem as @casionone.

We have intentionally disabled contextualSearch to be able to search in different languages, and now are receiving "Page not found" when selecting a search result with different locale.

It is possible to try it here: https://orange-field-06c9e2f03.azurestaticapps.net/

Is there any quick workaround possible? Maybe inject something like "pathname://" to search results?

When doing the search from an already localized page, it is even worse, as it appends a second locale to the URL:

https://orange-field-06c9e2f03.azurestaticapps.net/en-US/uk-UA/docs/hrp/doc-approval-workflow

I found an issue #4723 and PR #6731 regarding this, but it is unfinished for a long time.

slorber commented 1 year ago

@sergeyol by default we assume search results are part of the current single-page application, and are navigated using history.push("/newPath").

When using i18n, each locale is a different SPA site and we should use window.location.href instead of history.push so that we can transition from one SPA to another. Unfortunately we can't seamlessly transition from one localized site to another with the more dynamic SPA navigation.

We have an externalUrlRegexp that maybe could be a solution?

CleanShot 2023-01-18 at 19 40 45@2x

The upcoming version 2.3 should also allow you to wrap the SearchBar and provide a custom transformItems (search result items) props to do more advanced things with JS code. See https://github.com/facebook/docusaurus/issues/8461#issuecomment-1357326272

sergeyol commented 1 year ago

Thank you @slorber, indeed, I was not aware of that parameter, and it does the job for me. I've added our whole domain for now. Will also wait for 2.3 version, sounds promising.

tri-chu commented 1 year ago

Regarding the edge case for contextualSearch mentioned above, I'm still able to reproduce it in version 2.4. However, the docusaurus prod website doesn't seem to have this problem. Is there a PR for this fix that's not released?

slorber commented 1 year ago

@tri-chu this issue is quite old and contextualSearch not new. I don't think we fixed anything related to that recently.

If you have issues it's very difficult for me to help if you don't share your crawler config and your live production URL to see the problem myself.

slorber commented 1 year ago

Going to close this because contextualSearch is now enabled by default.

The only remaining edge case ("recent search hit in Chinese while you are on the English site", see https://github.com/facebook/docusaurus/issues/5880#issuecomment-1016674530) is now very unlikely to happen unless the user decide for some reason to disable contextual search, which I wouldn't particularly recommend.

tri-chu commented 1 year ago

Hi @slorber we're still hitting that problem pretty consistently on docusaurus 2.4 with contextual search turned on on our website here https://www.8thwall.com/docs/

We don't use the new Algolia Crawler but the legacy crawler with this config instead

{
  "index_name": "8thwall-docs-prod",
  "start_urls": [
    "https://www.8thwall.com/docs/"
  ],
  "sitemap_urls": [
    "https://www.8thwall.com/docs/sitemap.xml"
  ],
  "selectors": {
    "lvl0": {
      "selector": "(//ul[contains(@class,'menu__list')]//a[contains(@class, 'menu__link menu__link--sublist menu__link--active')]/text() | //nav[contains(@class, 'navbar')]//a[contains(@class, 'navbar__link--active')]/text())[last()]",
      "type": "xpath",
      "global": true,
      "default_value": "Documentation"
    },
    "lvl1": "header h1",
    "lvl2": "article h2",
    "lvl3": "article h3",
    "lvl4": "article h4",
    "lvl5": "article h5, article td:first-child",
    "text": "article p, article li, article td:last-child"
  },
  "strip_chars": " .,;:#",
  "custom_settings": {
    "separatorsToIndex": "_",
    "attributesForFaceting": [
      "language",
      "version",
      "type",
      "docusaurus_tag"
    ],
    "attributesToRetrieve": [
      "hierarchy",
      "content",
      "anchor",
      "url",
      "url_without_anchor",
      "type"
    ]
  }
}
tri-chu commented 1 year ago

Here is a screen recording of our issue:

https://user-images.githubusercontent.com/134897/234994943-3d6291ad-08c4-46ed-91f8-c7b5696d637c.mov

slorber commented 1 year ago

Thanks @tri-chu , you are right this edge case of "recent searches" is still common when user switch language.

@shortcuts is there a way to sandbox each language regarding recent searches?

This looks stored in localStorage under __DOCSEARCH_RECENT_SEARCHES__8thwall-docs-prod, is there a way for us to provide a different storage key for each locale, or this is hardcoded?

I saw this in the DocSearch docs but nothing to customize the localStorage key

CleanShot 2023-04-28 at 10 56 14@2x

https://docsearch.algolia.com/docs/api/#disableuserpersonalization

Also curious: what are favorites? is there a way to add a page as favorite now? 🤔

shortcuts commented 1 year ago

@shortcuts is there a way to sandbox each language regarding recent searches?

This looks stored in localStorage under DOCSEARCH_RECENT_SEARCHES8thwall-docs-prod, is there a way for us to provide a different storage key for each locale, or this is hardcoded?

Actually it's already stored with the correct key, we use the objectID, which is the full URL of the page, so it should consider locales, but on the GET I believe we retrieve all of the saved searches 🤔 we could introduce some logic here https://github.com/algolia/docsearch/blob/main/packages/docsearch-react/src/stored-searches.ts#L79

Also curious: what are favorites? is there a way to add a page as favorite now? 🤔

Yes, a user can add recent searches to favorite, it's also a local storage thing

Screenshot 2023-04-28 at 12 18 26

slorber commented 1 year ago

Oh, forgot about this favorite feature :D


@shortcuts I'm not sure to understand what you mean here

What I see in practice is that a single local storage key is used for all the Docusaurus localized sites:

CleanShot 2023-05-04 at 12 10 39@2x

I don't really understand what the objectID is and how it could be used to filter the results that we get from the localStorage

What I'd like is the ability to provide my own storage key, so that we have more than 1 storage value:

Does it make sense?

That also looks simpler easier to reason about and more performant because you are working with smaller recent search storage objects.


The alternative for Docusaurus could be to not sandbox locales, but make it possible for Docusaurus to know if a search it is from another localized site, so that we can decide if we should navigate SPA (/fr/doc1 => /fr/doc2) or MPA (/fr/doc1 => '/en/doc2`).

Technically we could probably do that already, but it's more complicated, and there are possible fancy edge cases if we build this by inspecting the search hit URL language prefix (convoluted: /fr => page about France on English website VS /fr/ root of the French site). It would be easier and more reliable if we could assign an i18n locale to each stored search hit.


To the the ability to pass custom storage keys is simpler and probably good enough