dask / dask-tutorial

Dask tutorial
https://tutorial.dask.org
BSD 3-Clause "New" or "Revised" License
1.83k stars 702 forks source link

Data nyc flights not found #191

Closed andrybicio closed 4 years ago

andrybicio commented 4 years ago

Hello, I'm running the following notebook: 01_dask.delayed.ipynb

When I need to download the nyc-flights database in order to have some data to work on: %run prep.py -d flights

at the end it raises an error of file not found:

Downloading NYC Flights dataset... 

---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
~/Physics_of_Data/MAPD/ModB/dask-tutorial-master/prep.py in <module>
    228 
    229 if __name__ == '__main__':
--> 230     sys.exit(main())

~/Physics_of_Data/MAPD/ModB/dask-tutorial-master/prep.py in main(args)
    224         accounts_json(args.small)
    225     if args.dataset == "flights" or args.dataset == "all":
--> 226         flights(args.small)
    227 
    228 

~/Physics_of_Data/MAPD/ModB/dask-tutorial-master/prep.py in flights(small)
     57         print("- Downloading NYC Flights dataset... ", end='', flush=True)
     58         url = sources.flights_url
---> 59         urllib.request.urlretrieve(url, flights_raw)
     60         print("done", flush=True)
     61 

~/anaconda3/lib/python3.7/urllib/request.py in urlretrieve(url, filename, reporthook, data)
    245     url_type, path = splittype(url)
    246 
--> 247     with contextlib.closing(urlopen(url, data)) as fp:
    248         headers = fp.info()
    249 

~/anaconda3/lib/python3.7/urllib/request.py in urlopen(url, data, timeout, cafile, capath, cadefault, context)
    220     else:
    221         opener = _opener
--> 222     return opener.open(url, data, timeout)
    223 
    224 def install_opener(opener):

~/anaconda3/lib/python3.7/urllib/request.py in open(self, fullurl, data, timeout)
    529         for processor in self.process_response.get(protocol, []):
    530             meth = getattr(processor, meth_name)
--> 531             response = meth(req, response)
    532 
    533         return response

~/anaconda3/lib/python3.7/urllib/request.py in http_response(self, request, response)
    639         if not (200 <= code < 300):
    640             response = self.parent.error(
--> 641                 'http', request, response, code, msg, hdrs)
    642 
    643         return response

~/anaconda3/lib/python3.7/urllib/request.py in error(self, proto, *args)
    567         if http_err:
    568             args = (dict, 'default', 'http_error_default') + orig_args
--> 569             return self._call_chain(*args)
    570 
    571 # XXX probably also want an abstract factory that knows when it makes

~/anaconda3/lib/python3.7/urllib/request.py in _call_chain(self, chain, kind, meth_name, *args)
    501         for handler in handlers:
    502             func = getattr(handler, meth_name)
--> 503             result = func(*args)
    504             if result is not None:
    505                 return result

~/anaconda3/lib/python3.7/urllib/request.py in http_error_default(self, req, fp, code, msg, hdrs)
    647 class HTTPDefaultErrorHandler(BaseHandler):
    648     def http_error_default(self, req, fp, code, msg, hdrs):
--> 649         raise HTTPError(req.full_url, code, msg, hdrs, fp)
    650 
    651 class HTTPRedirectHandler(BaseHandler):

HTTPError: HTTP Error 404: Not Found

It seems like the data file is not available any more or am I missing something?

TomAugspurger commented 4 years ago

Oops that might have been my fault (I was cleaning some buckets up). Can you try again now? I've re-uploaded the dataset.

FYI Dask folks, this is (apparently) tied to my Anaconda GCP account. While that's nice of them to host it for us, I believe that the Dask project has AWS credits available through NumFOCUS.

andrybicio commented 4 years ago

Ok, solved. Now it's working. Thank you!

mrocklin commented 4 years ago

FYI Dask folks, this is (apparently) tied to my Anaconda GCP account. While that's nice of them to host it for us, I believe that the Dask project has AWS credits available through NumFOCUS

Personally I'm fine with Dask paying for maintaining a few buckets of public data. We'll need to find someone to set things up and pass around credentials.

On Fri, Aug 21, 2020 at 4:27 AM Tom Augspurger notifications@github.com wrote:

Oops that might have been my fault (I was cleaning some buckets up). Can you try again now? I've re-uploaded the dataset.

FYI Dask folks, this is (apparently) tied to my Anaconda GCP account. While that's nice of them to host it for us, I believe that the Dask project has AWS credits available through NumFOCUS.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/dask/dask-tutorial/issues/191#issuecomment-678242986, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACKZTD277WBOKZ5RO64VNDSBZKYVANCNFSM4QHA4UQQ .