Nosto / techdocs

2 stars 2 forks source link

Add subsection to "How nosto crawler works" #12

Closed ValterAndersson closed 6 years ago

ValterAndersson commented 6 years ago

"When does the crawler not function"

I think we need an article about the crawler and how the staging site can not be white-listed in order for Nosto to Crawl the site. All to often many of my staging enviornments are whitelisted and does not pull product data. So in order to get real products in the recommendations what I suggest is to un-whitelist for a few hours, re-index and then we can style the recommendations will real products. This happens more often than not.

ValterAndersson commented 6 years ago

We mention "While Nosto's crawler attempts to keep its copy of your catalog as fresh as possible, there are scenarios where we may not be able to update all the information as quickly as needed." We should provide examples of when this might be needed to do or when Nosto does not pick up the changes quick enough, like a specifical example.

ValterAndersson commented 6 years ago

We state "You can recrawl as often as you need but bear in mind that every recrawl adds an extra page load to your server." but we should explain wha the ramifications might be of doing this too much or hitting the button multiple times, too frequently.

ValterAndersson commented 6 years ago

Added subsection Common errors while using the crawler approach to Nosto crawler page.

Common errors while using the crawler approach
Crawling relies on tagging metadata being present in the source on pageload. Nosto crawler does not execute Javascript and hence no dynamically injected or modified information will be crawled.

Nosto crawler is mostly dispatched from Amazon Web Services - US East. This means that you will need to add an exception based on the header/agent details for any geolocation redirects affecting the stores. This can otherwise interfere with multi-currency setups by populating USD as the base currency for all products.