algolia / docsearch

:blue_book: The easiest way to add search to your documentation.
https://docsearch.algolia.com
MIT License
3.96k stars 384 forks source link

crawler:"No records extracted" #1441

Closed Huihuawk closed 2 years ago

Huihuawk commented 2 years ago

Description

Hi. I configured the V3 version of algolia search. But the crawler could not get the HTML information correctly, resulting in "No records extracted". I don't know if there is any configuration missing that is causing the problem. The V2 version is fine. Here is the test screenshot: image image

Visit the webpage normally: image

doc site: https://open.hand-china.com/choerodon-ui/zh/procmp/data-entry/password now , doc use v2

shortcuts commented 2 years ago

Hey, are the elements we try to crawl from the config present in your DOM? Is your website client-side rendered?

Huihuawk commented 2 years ago

hi. The V2 version can be crawled, it is server-side rendering, using GATSBY.

The above doc site still uses v2, and search is available. But the site information needs to be updated. So need to migrate to v3

shortcuts commented 2 years ago

Hey, you can see the html tab of the URL tester that shows how we render your page.

Testing with https://open.hand-china.com/choerodon-ui/zh/mds/guide, even from the browser, you can see hard refresh not rendering the page (white page).

I believe there's a routing/redirecting issue on your side, as we can observe the URL (https://open.hand-china.com/choerodon-ui/zh/mds/guide) redirecting to https://open.hand-china.com/choerodon-ui/zh/mds/guide/ which redirects to https://open.hand-china.com/choerodon-ui/zh/mds/guide again.

On my side I've tried with all of those options in your config:

  ignoreNoFollowTo: true,
  ignoreNoIndex: true,
  ignoreRobotsTxtRules: true,
  ignoreCanonicalTo: true,
  renderJavaScript: true

This does not seem to be something that needs to be fixed on our side so I'm closing the issue.