facebook / docusaurus

Easy to maintain open source documentation websites.
https://docusaurus.io
MIT License
56.65k stars 8.51k forks source link

Algolia search does not pass query string correctly and returns no results #9532

Closed timothymcmackin closed 1 year ago

timothymcmackin commented 1 year ago

Have you read the Contributing Guidelines on issues?

Prerequisites

Description

I'm using the built-in Algolia search features of docusaurus in preset-classic. I know that my site is indexed and that my API key works because I can do a simple search from a curl request and get results. For example, this command searches for the keyword "contract" and returns many results from docs.tezos.com:

curl -X GET \
     -H "X-Algolia-API-Key: 57d6a376a3528866784a143809cc7427" \
     -H "X-Algolia-Application-Id: QRIAHGML9Q" \
    "https://QRIAHGML9Q-dsn.algolia.net/1/indexes/tezosdocs?query=contract&hitsPerPage=2&getRankingInfo=1"

When I open my site at docs.tezos.com and do a search for the same word, I get no results. In the Algolia dashboard, I see many results with an empty "query" field in the request, so that's probably why the searches return no results.

Screenshot 2023-11-10 at 1 19 16 PM

When I open the console and grab the URL of one of the search requests to Algolia, its form data looks like this:

{
  "requests": [
    {
      "query": "contract",
      "indexName": "tezosdocs",
      "params": "attributesToRetrieve=%5B%22hierarchy.lvl0%22%2C%22hierarchy.lvl1%22%2C%22hierarchy.lvl2%22%2C%22hierarchy.lvl3%22%2C%22hierarchy.lvl4%22%2C%22hierarchy.lvl5%22%2C%22hierarchy.lvl6%22%2C%22content%22%2C%22type%22%2C%22url%22%5D&attributesToSnippet=%5B%22hierarchy.lvl1%3A10%22%2C%22hierarchy.lvl2%3A10%22%2C%22hierarchy.lvl3%3A10%22%2C%22hierarchy.lvl4%3A10%22%2C%22hierarchy.lvl5%3A10%22%2C%22hierarchy.lvl6%3A10%22%2C%22content%3A10%22%5D&snippetEllipsisText=%E2%80%A6&highlightPreTag=%3Cmark%3E&highlightPostTag=%3C%2Fmark%3E&hitsPerPage=20&clickAnalytics=false&facetFilters=%5B%22language%3Aen%22%2C%5B%22docusaurus_tag%3Adefault%22%2C%22docusaurus_tag%3Adocs-default-current%22%5D%5D"
    }
  ]
}

I can copy this network request as a curl command from the console, run it in my terminal, and confirm that it gets no results.

Based on Algolia's API documentation for this API endpoint, the form data should look like this:

{
  "requests": [
    {
      "indexName": "tezosdocs",
      "params": "query=contract"
    }
  ]
}

If I replace the data in the network request with this JSON, the curl command returns results.

So it appears to me that docusaurus or the algolia search component (@docsearch/react) is sending a malformed query to algolia, causing it to return no results.

Reproducible demo

https://github.com/trilitech/tezos-developer-docs

Steps to reproduce

  1. Open docs.tezos.com in a web browser.
  2. Open the dev tools (option-command-i on mac or shift-control-i on windows).
  3. In the dev tools pane, go to the network tab and click Clear Network Log.
  4. On the site, click the search bar at the top right of the screen and type "contract" in the popup search window. Note that there are no search results.
  5. In the network tab, right-click the last network request and then click Copy > As Curl.
  6. Paste the command into a text editor. It should be going to the algolia.net domain.
  7. Run the command in the terminal and verify that it returned no search results because the nbHits field is 0 and the hits array is empty.
  8. In the text editor, replace the line that starts with --data-raw with this line:
    --data-raw '{"requests":[{"indexName":"tezosdocs","params":"query=contract"}]}' \
  9. Run the updated command in the terminal and see that there are many search results. I'm seeing the nbHits field say 356.

Expected behavior

Get search results

Actual behavior

No search results

Your environment

Self-service

slorber commented 1 year ago

So it appears to me that docusaurus or the algolia search component (@docsearch/react) is sending a malformed query to algolia, causing it to return no results.

Most likely Docusaurus sends the right query, but your index is not configured correctly according to our recommendations.

For our queries to work, the index must contain the query fields we query on, notably the "docusaurus_tag" field.

CleanShot 2023-11-11 at 17 11 38@2x


Please delete your index and recrawl your site with the recommended crawler configuration that we link to in our documentation. If it still does not work then we can re-open but we'll need you to provide your crawler config and screenshots of your Algolia index UI.

You can also reach out to the DocSearch support team through email or on their Discord.

timothymcmackin commented 12 months ago

Thanks for the info. I was able to get it working by creating a new crawler, re-indexing the site, and setting contextualSearch to False.

slorber commented 12 months ago

and setting contextualSearch to False.

Setting it to false might work but might also "hide" the problem. This setting disable the filtering on docusaurus_tag, so even if your index is misconfigured it will return results. The problem remains that your index is eventually misconfigured and it's important that you ensure the field docusaurus_tag is correctly indexed

timothymcmackin commented 12 months ago

I duplicated the crawler in the docusaurus documentation and made the small changes for my site:

new Crawler({
  appId: "QRIAHGML9Q",
  apiKey: "MY_API_KEY",
  rateLimit: 8,
  startUrls: ["https://docs.tezos.com"],
  sitemaps: ["https://docs.tezos.com/sitemap.xml"],
  saveBackup: true,
  ignoreQueryParams: ["source", "utm_*"],
  ignoreCanonicalTo: true,
  discoveryPatterns: ["https://docs.tezos.com/**"],
  actions: [
    {
      indexName: "tezosdocs",
      pathsToMatch: ["https://docs.tezos.com/**"],
      recordExtractor: ({ $, helpers }) => {
        // priority order: deepest active sub list header -> navbar active item -> 'Documentation'
        const lvl0 =
          $(
            ".menu__link.menu__link--sublist.menu__link--active, .navbar__item.navbar__link--active"
          )
            .last()
            .text() || "Documentation";

        return helpers.docsearch({
          recordProps: {
            lvl0: {
              selectors: "",
              defaultValue: lvl0,
            },
            lvl1: ["header h1", "article h1"],
            lvl2: "article h2",
            lvl3: "article h3",
            lvl4: "article h4",
            lvl5: "article h5, article td:first-child",
            lvl6: "article h6",
            content: "article p, article li, article td:last-child",
          },
          indexHeadings: true,
          aggregateContent: true,
          recordVersion: "v3",
        });
      },
    },
  ],
  initialIndexSettings: {
    "Tezos docs crawler": {
      attributesForFaceting: [
        "type",
        "lang",
        "language",
        "version",
        "docusaurus_tag",
      ],
      attributesToRetrieve: [
        "hierarchy",
        "content",
        "anchor",
        "url",
        "url_without_anchor",
        "type",
      ],
      attributesToHighlight: ["hierarchy", "content"],
      attributesToSnippet: ["content:10"],
      camelCaseAttributes: ["hierarchy", "content"],
      searchableAttributes: [
        "unordered(hierarchy.lvl0)",
        "unordered(hierarchy.lvl1)",
        "unordered(hierarchy.lvl2)",
        "unordered(hierarchy.lvl3)",
        "unordered(hierarchy.lvl4)",
        "unordered(hierarchy.lvl5)",
        "unordered(hierarchy.lvl6)",
        "content",
      ],
      distinct: true,
      attributeForDistinct: "url",
      customRanking: [
        "desc(weight.pageRank)",
        "desc(weight.level)",
        "asc(weight.position)",
      ],
      ranking: [
        "words",
        "filters",
        "typo",
        "attribute",
        "proximity",
        "exact",
        "custom",
      ],
      highlightPreTag: '<span class="algolia-docsearch-suggestion--highlight">',
      highlightPostTag: "</span>",
      minWordSizefor1Typo: 3,
      minWordSizefor2Typos: 7,
      allowTyposOnNumericTokens: false,
      minProximity: 1,
      ignorePlurals: true,
      advancedSyntax: true,
      attributeCriteriaComputedByMinProximity: true,
      removeWordsIfNoResults: "allOptional",
      separatorsToIndex: "_",
    },
  },
});

I re-indexed the site with this crawler and I can see that it is indexed on the docusaurus_tag tag:

Screenshot 2023-11-17 at 9 08 35 AM

However, I still see no results in my search when I turn contextualSearch off. What does my index need to look like to work correctly?

timothymcmackin commented 12 months ago

Per your comment here: https://github.com/facebook/docusaurus/issues/6693#issuecomment-1158639529 I have also made docusaurus_tag searchable in Algolia:

Screenshot 2023-11-17 at 12 57 40 PM

But the search still returns no results in the popup/modal window when contextual search is on.

slorber commented 11 months ago

Contextual search should rather be on, not off

I don't know all the algolia docsearch details to be able to troubleshoot this on your site but you can reach out to their support if needed. Cc @shortcuts

FZambia commented 10 months ago

Had similar problem. Looks like I passed this quest, will add instruction which helped me to get non-empty results with Algolia search widget:

  1. Reconfigure crawler on https://crawler.algolia.com/admin/crawlers/ to use recommended config from https://docsearch.algolia.com/docs/templates/#docusaurus-v3-template
  2. In my case I dropped index, but probably it's possible to just restart crawler
  3. Go to index configuration and add docusaurus_tag and language to Attributes for faceting and make them searchable. As far as I understood all fields which present in query under facetFilters must be enabled here.

After that in my case queries finally started to return results.

timothymcmackin commented 10 months ago

This is what fixed it for me: I changed this code:

  initialIndexSettings: {
    "Tezos docs crawler": {

to:

  initialIndexSettings: {
    tezosdocs: {

Then I deleted the index and re-indexed.

slorber commented 7 months ago

EDIT: see Troubleshooting section added to our docs here:

https://docusaurus.io/docs/search#algolia-troubleshooting

No search result?

For anyone passing by, if you don't get any Algolia search results:

image

See also: https://github.com/facebook/docusaurus/discussions/10007#discussioncomment-9021352