geopython / GeoHealthCheck

Service Status and QoS Checker for OGC Web Services
https://geohealthcheck.org
MIT License
84 stars 71 forks source link

Retry on http502 error during OWSlib request #386

Open jochemthart1 opened 3 years ago

jochemthart1 commented 3 years ago

This issue is related to issue #366.

Is your feature request related to a problem? Please describe. During a stress test of GHC (lots of resources with high frequency), I am getting mysterious http502 errors during WebFeatureService request in the get_metadata function inside wfs.py, wms.py, etc. These errors can be avoided by retrying requests.

Describe the solution you'd like Currently, in probe.py there is already something in place for this using the create_requests_retry_session() function in util.py. The same principle could be applied to the request that is done in get_metadata functions of each probe (wfs.py, wms.py, etc.). wfs.py

This request however uses the WebFeatureService() function from OWSlib, so the best solution in my opinion would be to use the create_requests_retry_session() in OWSlib. Maybe other OWSlib users have also seen this http502 error? This fix might improve OWSlib for other users as well.

Describe alternatives you've considered An alternative is a simple loop with an exception clause which retries the WebFeatureService() request up to 3 times:

    def get_metadata(self, resource, version='1.1.0', retries=3):
        """
        Get metadata, specific per Resource type.
        :param resource:
        :param version:
        :return: Metadata object
        request_headers = self.get_request_headers()
        """
        for i in range(retries):
            try:
                return WebFeatureService(
                    resource.url,
                    version=version,
                    headers=request_headers)
            except:
                continue

        raise Exception(f'Get Metadata Request Exceeded {retries} Retries')
justb4 commented 3 years ago

Yes, best to fix with a "retry session" in OWSLib. Though, there can always be cases where there is really a problem: for example a load-balancer with e.g. 3 backend GeoServer instances of which 2 are permanently failing for some reason. Then 3 retries will always succeed, but there is still an undetected problem. Maybe a "Warning" or "Suspicious" type of verdict would be better in those cases. Or are these 502-cases caused within the GHC-Docker environment somehow?