Open FuhuXia opened 1 year ago
For Option 2 above, we could implement additional functionality into our solr docker image that does a more comprehensive server check of solr and publishes the health to a new endpoint. As an example, we could add an nginx
service into the solr
docker image which checks solr
health and then updates the /healthcheck
route for the AWS target group to be better informed about the state of the solr
service.
User reported package_search
API calls returning 409 then 502 error during the solr restarts.
During solr restart, for a short time period solr might send bad responses to CKAN, with which CKAN generates a search error page. It might be cached by CloudFront, user will see the error page for quite a while before the cache expires.
How to reproduce
Hard to catch it during solr restart.
Expected behavior
No error page
Actual behavior
Search Error page cached.
Sketch
For now the manual step to fix the error is to clear CloudFront cache for
/*
. A quick fix would be shortening the most visited page (e.g./dataset
) to use 1 minute cache time so that the error wont last long.Long term wise I can think of three ways to resolve/alleviate the issue: