algolia / algoliasearch-netlify

Official Algolia Plugin for Netlify. Index your website to Algolia when deploying your project to Netlify with the Algolia Crawler
https://www.algolia.com/doc/tools/crawler/netlify-plugin/quick-start/
261 stars 10 forks source link

Crawler finds sitemap but does not find links in it #156

Closed atn38 closed 3 years ago

atn38 commented 3 years ago

Hello Algolia team,

I have a netlify website that is deployed using Netlify at https://ble.lternet.edu/, and would like to try Algolia for its search function. I made a fork of the underlying Github repository to test it out. The fork deploy URL is https://stoic-elion-ec8918.netlify.app/.

I installed and made changes to the Algolia plugin config in netlify.toml according to your recommendations. The crawler seems to be able to find sitemap.xml, but no links within it (see screenshot).

image

My hunch is that it's got something to do with the links in sitemap.xml being absolute URLs pointing to https://ble.lternet.edu/, and tried to set the customDomain parameter to match, but no dice.

Any ideas? Thanks for your work!

bodinsamuel commented 3 years ago

Hello @atn38,

thanks for the very thorough report, the domain must be set without the protocol, so just ble.lternet.edu

I have tried and it seems to work better like this. I'll create a ticket on our side to improve the validation of this field 👍🏻

Looking forward your answer

atn38 commented 3 years ago

Hello @bodinsamuel,

Thanks for the hint -- this seems to work better. The crawler now finds most of the pages, except for one. Any ideas why this might happen?

bodinsamuel commented 3 years ago

The crawler now finds most of the pages, except for one. Any ideas why this might happen?

Can you share which one? Usually that happens with orphan pages (not on sitemap or linked anywhere) but let's check

atn38 commented 3 years ago

Turns out that was my bad, the page header canonical URL was wrong for that particular case. Thanks for your help!

bodinsamuel commented 3 years ago

Thanks for reaching out, very glad it's all good. Closing this issue 👍🏻