gis-ops / routingpy

🌎 Python library to access all public routing, isochrones and matrix APIs in a consistent manner.
https://routingpy.readthedocs.io/en/latest/?badge=latest
Apache License 2.0
270 stars 26 forks source link

MapboxOSRM: CloudFront error when 2 requests follow each other #105

Closed khamaileon closed 1 year ago

khamaileon commented 1 year ago

Here's what I did

from routingpy.routers import MapboxOSRM
from pprint import pprint

# Some locations in Berlin
coords = [[13.413706, 52.490202], [13.421838, 52.514105],
          [13.453649, 52.507987], [13.401947, 52.543373]]

client = MapboxOSRM(api_key='token')

route = client.directions(locations=coords, profile='walking')
isochrones = client.isochrones(locations=coords[0], profile='walking', intervals=[600, 1200])

pprint((route.geometry, route.duration, route.distance, route.raw))
pprint((isochrones.raw, isochrones[0].geometry, isochrones[0].center, isochrones[0].interval))

Here's what I got

---------------------------------------------------------------------------
JSONDecodeError                           Traceback (most recent call last)
File ~/.virtualenvs/middler-api/lib/python3.8/site-packages/requests/models.py:971, in Response.json(self, **kwargs)
    970 try:
--> 971     return complexjson.loads(self.text, **kwargs)
    972 except JSONDecodeError as e:
    973     # Catch JSON-related errors and raise as requests.JSONDecodeError
    974     # This aliases json.JSONDecodeError and simplejson.JSONDecodeError

File /usr/lib/python3.8/json/__init__.py:357, in loads(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    354 if (cls is None and object_hook is None and
    355         parse_int is None and parse_float is None and
    356         parse_constant is None and object_pairs_hook is None and not kw):
--> 357     return _default_decoder.decode(s)
    358 if cls is None:

File /usr/lib/python3.8/json/decoder.py:337, in JSONDecoder.decode(self, s, _w)
    333 """Return the Python representation of ``s`` (a ``str`` instance
    334 containing a JSON document).
    335 
    336 """
--> 337 obj, end = self.raw_decode(s, idx=_w(s, 0).end())
    338 end = _w(s, end).end()

File /usr/lib/python3.8/json/decoder.py:355, in JSONDecoder.raw_decode(self, s, idx)
    354 except StopIteration as err:
--> 355     raise JSONDecodeError("Expecting value", s, err.value) from None
    356 return obj, end

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

JSONDecodeError                           Traceback (most recent call last)
File ~/.virtualenvs/middler-api/src/routingpy/routingpy/client_default.py:235, in Client._get_body(response)
    234 try:
--> 235     body = response.json()
    236 except json.decoder.JSONDecodeError:

File ~/.virtualenvs/middler-api/lib/python3.8/site-packages/requests/models.py:975, in Response.json(self, **kwargs)
    972 except JSONDecodeError as e:
    973     # Catch JSON-related errors and raise as requests.JSONDecodeError
    974     # This aliases json.JSONDecodeError and simplejson.JSONDecodeError
--> 975     raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

JSONParseError                            Traceback (most recent call last)
Cell In[47], line 2
      1 route = client.directions(locations=coords, profile='walking')
----> 2 isochrones = client.isochrones(locations=coords[0], profile='walking', intervals=[600, 1200])
      3 #matrix = client.matrix(locations=coords, profile='walking')

File ~/.virtualenvs/middler-api/src/routingpy/routingpy/routers/mapbox_osrm.py:413, in MapboxOSRM.isochrones(self, locations, profile, intervals, contours_colors, polygons, denoise, generalize, dry_run)
    408     params["generalize"] = generalize
    410 profile = profile.replace("mapbox/", "")
    412 return self.parse_isochrone_json(
--> 413     self.client._request(
    414         "/isochrone/v1/mapbox/" + profile + "/" + locations_string,
    415         get_params=params,
    416         dry_run=dry_run,
    417     ),
    418     intervals,
    419     locations,
    420 )

File ~/.virtualenvs/middler-api/src/routingpy/routingpy/client_default.py:200, in Client._request(self, url, get_params, post_params, first_request_time, retry_counter, dry_run)
    197     return self._request(url, get_params, post_params, first_request_time, retry_counter + 1)
    199 try:
--> 200     result = self._get_body(response)
    202     return result
    204 except exceptions.RouterApiError:

File ~/.virtualenvs/middler-api/src/routingpy/routingpy/client_default.py:237, in Client._get_body(response)
    235     body = response.json()
    236 except json.decoder.JSONDecodeError:
--> 237     raise exceptions.JSONParseError("Can't decode JSON response:{}".format(response.text))
    239 if status_code == 429:
    240     raise exceptions.OverQueryLimit(status_code, body)

JSONParseError: Can't decode JSON response:<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<HTML><HEAD><META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<TITLE>ERROR: The request could not be satisfied</TITLE>
</HEAD><BODY>
<H1>403 ERROR</H1>
<H2>The request could not be satisfied.</H2>
<HR noshade size="1px">
Bad request.
We can't connect to the server for this app or website at this time. There might be too much traffic or a configuration error. Try again later, or contact the app or website owner.
<BR clear="all">
If you provide content to customers through CloudFront, you can find steps to troubleshoot and help prevent this error by reviewing the CloudFront documentation.
<BR clear="all">
<HR noshade size="1px">
<PRE>
Generated by cloudfront (CloudFront)
Request ID: iEXFhiP_CsxoQhlfeCQf6c12ZikAJBNqsftyZtYi_7fbDnoYBr9u0Q==
</PRE>
<ADDRESS>
</ADDRESS>
</BODY></HTML>

Here's what I was expecting

Result of the 2 requests


Here's what I think could be improved

At first I thought the calls must be too close together. So I added a time.sleep between them, but without success.

nilsnolde commented 1 year ago

Probably Mapbox changed smth on their isochrone endpoint. Could you check? With dry_run=True you should just get the URL & parameters.

khamaileon commented 1 year ago
url:
https://api.mapbox.com/directions/v5/mapbox/walking?access_token=token
Parameters:
{
  "headers": {
    "User-Agent": "routingpy/v0.0.post296+g2357ccd",
    "Content-Type": "application/x-www-form-urlencoded"
  },
  "timeout": 60,
  "data": {
    "coordinates": "13.413706,52.490202;13.421838,52.514105;13.453649,52.507987;13.401947,52.543373"
  }
}
url:
https://api.mapbox.com/isochrone/v1/mapbox/walking/13.413706,52.490202?access_token=token&contours_minutes=10%2C20
Parameters:
{
  "headers": {
    "User-Agent": "routingpy/v0.0.post296+g2357ccd",
    "Content-Type": "application/x-www-form-urlencoded"
  },
  "timeout": 60,
  "data": {
    "coordinates": "13.413706,52.490202;13.421838,52.514105;13.453649,52.507987;13.401947,52.543373"
  }
}
khamaileon commented 1 year ago

The 2 queries work independently.

khamaileon commented 1 year ago

It works if I instantiate 2 clients.

client = MapboxOSRM(api_key='token')
client2 = MapboxOSRM(api_key='token')

route = client.directions(locations=coords, profile='walking')
isochrones = client2.isochrones(locations=coords[0], profile='walking', intervals=[600, 1200])
nilsnolde commented 1 year ago

Haha what?! The 403 error in the first example is usually "unauthorized". What happens if you use Postman or so and do it similarly? I can't imagine what differs when you instantiate 2 clients.. it's still the same user agent, IP etc..

khamaileon commented 1 year ago

I put 2 prints here:

image

With the same client, I got:

https://api.mapbox.com/directions/v5/mapbox/walking?access_token=token
{'User-Agent': 'routingpy/v0.0.post296+g2357ccd', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive', 'Content-Type': 'application/x-www-form-urlencoded', 'Content-Length': '105'}
https://api.mapbox.com/isochrone/v1/mapbox/walking/13.413706,52.490202?access_token=token&contours_minutes=10%2C20
{'User-Agent': 'routingpy/v0.0.post296+g2357ccd', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive', 'Content-Type': 'application/x-www-form-urlencoded', 'Content-Length': '105'}

With different clients:

https://api.mapbox.com/directions/v5/mapbox/walking?access_token=token
{'User-Agent': 'routingpy/v0.0.post296+g2357ccd', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive', 'Content-Type': 'application/x-www-form-urlencoded', 'Content-Length': '105'}
https://api.mapbox.com/isochrone/v1/mapbox/walking/13.413706,52.490202?access_token=token&contours_minutes=10%2C20
{'User-Agent': 'routingpy/v0.0.post296+g2357ccd', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive', 'Content-Type': 'application/x-www-form-urlencoded'}

As you can see, there is a problem with request headers.

nilsnolde commented 1 year ago

So it's either Content-Length header on the isochrone request or the keep-alive connection? I'd assume it's the former, as it seems that's a GET request which doesn't have any content, so maybe their CloudFront proxy denies that? Who's setting that header? Is it us or requests? Can you try without?

khamaileon commented 1 year ago

I think that's where the problem comes from. The same session and therefore the same headers are reused. Cloudfront must validate the content-length.

https://github.com/gis-ops/routingpy/blob/2357ccd67573ce985c74647a784cd8adedc09365/routingpy/client_default.py#L74

khamaileon commented 1 year ago

I'll take a look later in the day or tomorrow.

nilsnolde commented 1 year ago

Right, that's not good. Thanks for the investigation!

nilsnolde commented 1 year ago

It's strange though that a Session object would just keep all those headers and not re-compute them per request, but maybe I'm not understanding some of the subtleties around that.. For me this is only about session pooling so we can keep a connection alive, which should be more performant than opening a new one for each request..

nilsnolde commented 1 year ago

Does it also do that if you turn the requests around? First GET, then POST, and it won't set a content length? I'm almost inclined to assume that's a bug in requests..

khamaileon commented 1 year ago

Does it also do that if you turn the requests around? First GET, then POST, and it won't set a content length? I'm almost inclined to assume that's a bug in requests..

Yes it works that way.

khamaileon commented 1 year ago

In the end, it was just a pointer issue :)

nilsnolde commented 1 year ago

fixed in #111