algolia / algoliasearch-netlify

Official Algolia Plugin for Netlify. Index your website to Algolia when deploying your project to Netlify with the Algolia Crawler
https://www.algolia.com/doc/tools/crawler/netlify-plugin/quick-start/
261 stars 10 forks source link

[feature request]add pathPrefix to inputs #85

Closed thundermiracle closed 3 years ago

thundermiracle commented 4 years ago

Gatsby generate urls with sub path when enables pathPrefix.

I'm using this feature to deploy all blogs in Netlify to https://xxx.com/blog/ without https://xxx.com which caused algolia's indexing approach failed.

Would you please consider adding an option pathPrefix to inputs to enable the sub directory deployment?

bodinsamuel commented 4 years ago

Hello @thundermiracle,

Thanks again for reporting and going the extra length by also doing the PR. For the moment we have chose to expose the less possible parameters to avoid increasing the complexity of the plugin.

We have discussed with the team the validity of your use case and find out solutions can be found without having to introduce a new param for the plugin.

Our recommendation

Add sitemaps to your site: example.com/robots.txt

Sitemap: http://www.example.com/sitemap.xml

We will follow all links found in this sitemap, so we will be able to find your blog and all subsequent pages. N.B: We will check only the root level robots.txt as the spec suggest


In your case /blog is hosted in the same Netlify site as / so that means you should have access to the root level robots.txt. If your Blog was a dedicated Netlify site, then website would be accessible without any prefix.

Restrict crawl to prefix

In the case you only want to index /blog, you can add a specific rule for our Crawler, in the robots.txt. This syntax will disallow everything except /blog/** :

User-agent: Algolia Crawler
Allow: /blog
Disallow: /

We believe those 2 simple fixes are enough to enable what you want to achieve. However if it's not the case, please share again ☺️

thundermiracle commented 4 years ago

@bodinsamuel Thank you very much for your time and your reply. I really appreciated it! I took your advice and let algolia know my sitemap.xml but the problem remains. Perhaps this isn't a pathPrefix problem anymore. Here is my usecase:

  1. Main site -> Netlify Site1: https://xxxxxxxx.netlify.app (Customized Domain to example.com) with sitemap.xml;
  2. Blog site -> Netlify Site2: https://yyyyyy.netlify.app with sitemap.xml which can be accessed by example.com/blog/sitemap.xml or https://yyyyyy.netlify.app/sitemap.xml
  3. In Main site's, redirect /blog/*" tohttps://yyyyyy.netlify.app/:splat` so we can access blog part like: example.com/blog

And I saw that algolia successfully got sitemap.xml but failed to retrieve the links in it. Would you please tell me how can I avoid this problem? PS: links in sitemap.xml are https://example.com/blog/xxxxx

image

bodinsamuel commented 4 years ago

PS: links in sitemap.xml are https://example.com/blog/xxxxx

Mmh yes, that's an oversight an our end. We assumed a bit too much about the flexibility of the websites. We are going to look at that a bit further, to find appropriate solution.

One other question for you, I can see you have a custom domain but it's not set in Netlify, is there any reason for that? We are using the domain provided in your Netlify config to hot-replace hostname everywhere. For example: thundermiracle.com/blog -> foobar-123-thundermiracle.netlify.app/blog

That helps when there is hardcoded hostname in sitemaps, canonical and such. But because you didn't set up in Netlify it, in the end, won't work. We should probably expose that configuration too 😶

thundermiracle commented 4 years ago

I can see you have a custom domain but it's not set in Netlify

I set it in my Main site (thundermiracle.com), so it's not possible to be set again in my Blog site(thundermiracle.com/blog).

And if I use the custom domain in my Blog site, I shall put all blog site's source in a sub directory /blog which will need pathPrefix to let algolia know the correct path of sitemap.xml.

Really thanks for your support. Algolia is really hot and after joining JAMStack conf I knew this great plugin to alleviate the hardness of configuring work. I'm looking forward to your solution.

bodinsamuel commented 4 years ago

Hey @thundermiracle

We carefully reviewed the solutions and find out potential use cases on our end. So we are going to move forward and try to implement a solution to alias prefix on our end.

The goal is to achieve something really simple on the plugin side, e.g:

[[plugins]]
package = "@algolia/netlify-plugin-crawler"
  [plugins.inputs]
  removePrefix = "/blog"
  hostname = "thundermiracle.com"

With this we should be able to support your use case, I'll ping again when it's ready ☺️

Thanks for the feedback and the patience

bodinsamuel commented 3 years ago

Feature has been deployed since a few weeks.