algolia / docsearch-configs

DocSearch - Configurations
https://docsearch.algolia.com/
MIT License
457 stars 1.13k forks source link

Page doesn't crawled #969

Closed astyfx closed 5 years ago

astyfx commented 5 years ago

Do you want to request a feature or report a bug?

Help wanted

If it is a DocSearch index issue, what is the related index_name ?

index_name= sendbird

What is the current behaviour?

https://docs.sendbird.com/platform/data_privacy

Above page doesn't be crawled

What is the expected behaviour?

To be crawled

What have you tried to solve it?

I added sitemap.xml and version bump up 1.0.1 to 1.0.2 after page added

Any other feedback / questions ?

Should I remove the version tag (facet filter) and only_content_level = true to false?

Is there any best practice for our document site

s-pace commented 5 years ago

Please add this URL from your sitemap. It is missing. I have checked it. The crawl is successful. Once you will add it it will be parsed.

astyfx commented 5 years ago

@s-pace What is the meaning of this URL?

our sitemap already has https://docs.sendbird.com/platform/data_privacy

s-pace commented 5 years ago

Is it compliant with sitemaps.org?

astyfx commented 5 years ago

@s-pace Yes, I've checked just now using several sitemap checkers

I will update frequency to daily

astyfx commented 5 years ago

We update our document content anytime. Then do i have to remove version and version facet filter?

s-pace commented 5 years ago

No you shouldn’t

s-pace commented 5 years ago

I will give it a close look when I am back from PTO on Monday. You can add this URL as start_urls in the meantime

astyfx commented 5 years ago

@s-pace Any updates?

s-pace commented 5 years ago

Sorry for the delay.

It seems that we are not able to parse the sitemap and that this page is not referenced from another one thanks to a <a/> tag. I will need to dig more to understand why the sitemap is not correctly handled. Do you have any lead? Is this page unique in a way? Do you do a specific redirection on it?

astyfx commented 5 years ago

Yes it is unique, No redirection on it explicitly (like href url)

Our site run on next.js and use Link from next.js it might be harmful for scrapper?

Or, we have been added outer link on menu item recently (it related to our docsearch configs) is it harmful?

astyfx commented 5 years ago

We added specific href to anchor component, then crawled successfully.

Thanks for your effort

s-pace commented 5 years ago

Glad to hear, feel free to reopen if needed

astyfx commented 5 years ago

@s-pace, after you updated our config Data Privacy page, is not searchable again. Could you look into it?