UDST / urbanaccess

A tool for GTFS transit and OSM pedestrian network accessibility analysis by UrbanSim
https://udst.github.io/urbanaccess/index.html
GNU Affero General Public License v3.0
236 stars 56 forks source link

403 error with certain transit providers due to missing headers in request #72

Closed knaaptime closed 3 years ago

knaaptime commented 3 years ago
import geopandas as gpd

rside = gpd.read_file("https://www.dropbox.com/s/u4ah7y8t4a9jg45/rside.zip?dl=1")

feed =  {'riversidetransitagency': 'http://www.riversidetransit.com/google_transit.zip'}

import urbanaccess as ua

ua.gtfsfeeds.feeds.add_feed(feed)
ua.gtfsfeeds.download(data_folder=".")

yields

1 GTFS feeds will be downloaded here: ./gtfsfeed_zips
---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
<ipython-input-1-273db80af85d> in <module>
      8 
      9 ua.gtfsfeeds.feeds.add_feed(feed)
---> 10 ua.gtfsfeeds.download(data_folder=".")

~/anaconda3/envs/catshoods/lib/python3.7/site-packages/urbanaccess/gtfsfeeds.py in download(data_folder, feed_name, feed_url, feed_dict, error_pause_duration, delete_zips)
    469 
    470         if 'http' in feed_url_value:
--> 471             status_code = urlopen(feed_url_value).getcode()
    472             if status_code == 200:
    473                 file = urlopen(feed_url_value)

~/anaconda3/envs/catshoods/lib/python3.7/urllib/request.py in urlopen(url, data, timeout, cafile, capath, cadefault, context)
    220     else:
    221         opener = _opener
--> 222     return opener.open(url, data, timeout)
    223 
    224 def install_opener(opener):

~/anaconda3/envs/catshoods/lib/python3.7/urllib/request.py in open(self, fullurl, data, timeout)
    529         for processor in self.process_response.get(protocol, []):
    530             meth = getattr(processor, meth_name)
--> 531             response = meth(req, response)
    532 
    533         return response

~/anaconda3/envs/catshoods/lib/python3.7/urllib/request.py in http_response(self, request, response)
    639         if not (200 <= code < 300):
    640             response = self.parent.error(
--> 641                 'http', request, response, code, msg, hdrs)
    642 
    643         return response

~/anaconda3/envs/catshoods/lib/python3.7/urllib/request.py in error(self, proto, *args)
    567         if http_err:
    568             args = (dict, 'default', 'http_error_default') + orig_args
--> 569             return self._call_chain(*args)
    570 
    571 # XXX probably also want an abstract factory that knows when it makes

~/anaconda3/envs/catshoods/lib/python3.7/urllib/request.py in _call_chain(self, chain, kind, meth_name, *args)
    501         for handler in handlers:
    502             func = getattr(handler, meth_name)
--> 503             result = func(*args)
    504             if result is not None:
    505                 return result

~/anaconda3/envs/catshoods/lib/python3.7/urllib/request.py in http_error_default(self, req, fp, code, msg, hdrs)
    647 class HTTPDefaultErrorHandler(BaseHandler):
    648     def http_error_default(self, req, fp, code, msg, hdrs):
--> 649         raise HTTPError(req.full_url, code, msg, hdrs, fp)
    650 
    651 class HTTPRedirectHandler(BaseHandler):

HTTPError: HTTP Error 403: Forbidden

this can be resolved by #71

sablanchard commented 3 years ago

Thanks @knaaptime ! We will take a look at the PR soon and will get back to you. Thanks again.

sablanchard commented 3 years ago

Hi @knaaptime ! Thank you for the PR that fixes this issue. I have taken a look and opened up another PR that I think solves the issue in a slightly different way with as minimal modification I found I could make that let the request be accepted - for the url you provided. Can you try out this branch here: https://github.com/UDST/urbanaccess/tree/enhancement/download-feed-wheader and PR is: https://github.com/UDST/urbanaccess/pull/74

Let us know if this branch solves your issue or if there are other urls that fail and more headers are needed, if so we can take a look at adding more headers. If this solution works as is, Id like to use this solution. I have also added a bit of maintenance on this branch as well to improve that section of code.

sablanchard commented 3 years ago

Hi @knaaptime wanted to check in and see if you have tried out the branch yet above and see if this will also solve your issue or if more urls you have you are finding you need more header information? Let us know when you can.

knaaptime commented 3 years ago

hey, sorry it took me a little while, and thanks for your commitment to this. Your solution works great

sablanchard commented 3 years ago

ok great thanks @knaaptime - Ill mark this as solved by #74 its been merged to dev pending a new release