algolia / docsearch-scraper

DocSearch - Scraper
https://docsearch.algolia.com/
Other
305 stars 106 forks source link

The error message for host unreachable in older versions was significantly better... #565

Open elucidsoft opened 2 years ago

elucidsoft commented 2 years ago

I have been banging my head trying to get this to work and kept getting host unreachable error. After 3 hours, I tried the 1.13.0 docker image instead. The error message it gave was SIGNIFICANTLY better and I was immediately able to recognize the issue. I really suggest you put that back to how it was, I just wasted an immense amount of time.

shortcuts commented 2 years ago

Hey, could you please provide more context? What are the errors/differences?

The scraper uses the Algolia Python client so I don't think the issue is related to this repo

elucidsoft commented 2 years ago

The error message I was getting was host unreachable on the latest version. On the v1.13.0 message, it told me exactly what was wrong as it showed the entire neterror stack and I could clearly see that I had a malformed credentials.

Markeli commented 2 years ago

I had the same behavior. My auto-update script incorrectly added trailing extra whitespace to APPLICATION_ID , because of that docsearch-scrapprer made incorrect hostname. But in the latest version, I got AlgoliaUnreachableHostException: Unreachable hosts without any useful information. After downgrading to v1.13.0 I got some details that allowed me to solve the issue.

shortcuts commented 2 years ago

Hey @elucidsoft, @Markeli, looking at the past commit, I can only see one change that could cause that: we upgraded the scraper to the latest major version of our Python client, which might handle errors differently.

After downgrading to v1.13.0 I got some details that allowed me to solve the issue.

I believe using an older version won't change the indexing, most updates were to make the scraper more stable and detect the website structure when bootstrapping config.

Note that prior to our new infra, we will only accept community contribution unless there's an urgent fix to do.