euroargodev / argopy

A python library for Argo data beginners and experts
https://argopy.readthedocs.io
European Union Public License 1.2
172 stars 39 forks source link

Errors when fetching data #220

Closed mamoonrashid closed 2 years ago

mamoonrashid commented 2 years ago

Hi, I am seeing errors when I try to fetch data. The errors are different when trying fetch from different sources (see examples below).

Example from GDAC

from argopy import DataFetcher as ArgoDataFetcher
argo_loader = ArgoDataFetcher(src='gdac', parallel=True, progress=True, cache=False, mode='expert', dataset='phy')
argo_loader.region([-180, 180, -90, 90, 0, 5000, '2020-11-11', '2020-11-12']).load()

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-3-17f34696626e> in <module>
----> 1 argo_loader.region([-180, 180, -90, 90, 0, 5000, '2020-11-11', '2020-11-12']).load()
      2 argo_loader

~\Documents\Programming\argopy\argopy\fetchers.py in wrapper(*args)
     49                             (AccessPoint.__name__, args[0]._src, ", ".join(args[0].Fetchers.keys()))
     50                         )
---> 51         return AccessPoint(*args)
     52     wrapper.__name__ = AccessPoint.__name__
     53     wrapper.__doc__ = AccessPoint.__doc__

~\Documents\Programming\argopy\argopy\fetchers.py in region(self, box)
    395         is_box(box, errors="raise")  # Validate the box definition
    396 
--> 397         self.fetcher = self.Fetchers["region"](box=box, **self.fetcher_options)
    398         self._AccessPoint = "region"  # Register the requested access point
    399         self._AccessPoint_data = {'box': box}  # Register the requested access point data

~\Documents\Programming\argopy\argopy\data_fetchers\gdacftp_data.py in __init__(self, ftp, ds, cache, cachedir, dimension, errors, parallel, parallel_method, progress, api_timeout, **kwargs)
    119 
    120         # Validation of self.server is done by the indexstore:
--> 121         self.indexfs = indexstore(
    122             host=self.server,
    123             index_file=index_file,

~\Documents\Programming\argopy\argopy\stores\argo_index_proto.py in __init__(self, host, index_file, cache, cachedir, timeout)
    139 
    140         self.index_path = self.fs["src"].fs.sep.join([self.host, self.index_file])
--> 141         if not self.fs["src"].exists(self.index_path):
    142             raise FtpPathError("Index file does not exist: %s" % self.index_path)
    143 

~\Documents\Programming\argopy\argopy\stores\filesystems.py in exists(self, path, *args)
    138 
    139     def exists(self, path, *args):
--> 140         return self.fs.exists(path, *args)
    141 
    142     def expand_path(self, path):

~\anaconda3\lib\site-packages\fsspec\implementations\http.py in exists(self, path)
    119         kwargs["stream"] = True
    120         try:
--> 121             r = self.session.get(path, **kwargs)
    122             r.close()
    123             return r.ok

~\anaconda3\lib\site-packages\requests\sessions.py in get(self, url, **kwargs)
    541 
    542         kwargs.setdefault('allow_redirects', True)
--> 543         return self.request('GET', url, **kwargs)
    544 
    545     def options(self, url, **kwargs):

TypeError: request() got an unexpected keyword argument 'client_kwargs'

Example from ARGOVIS

from argopy import DataFetcher as ArgoDataFetcher
argo_loader = ArgoDataFetcher(src='argovis', parallel=True, progress=True, cache=False, mode='expert', dataset='phy')
argo_loader.region([-180, 180, -90, 90, 0, 5000, '2020-11-11', '2020-11-12']).load()

---------------------------------------------------------------------------
DataNotFound                              Traceback (most recent call last)
<ipython-input-2-17f34696626e> in <module>
----> 1 argo_loader.region([-180, 180, -90, 90, 0, 5000, '2020-11-11', '2020-11-12']).load()
      2 argo_loader

~\Documents\Programming\argopy\argopy\fetchers.py in load(self, force, **kwargs)
    549         if not self._loaded or force:
    550             # Fetch measurements:
--> 551             self._data = self.to_xarray(**kwargs)
    552             # Next 2 lines must come before ._index because to_index(full=False) calls back on .load() to read .data
    553             self._request = self.__repr__()  # Save definition of loaded data

~\Documents\Programming\argopy\argopy\fetchers.py in to_xarray(self, **kwargs)
    424                 % ",".join(self.Fetchers.keys())
    425             )
--> 426         xds = self.fetcher.to_xarray(**kwargs)
    427         xds = self.postproccessor(xds)
    428 

~\Documents\Programming\argopy\argopy\data_fetchers\argovis_data.py in to_xarray(self, errors)
    304     def to_xarray(self, errors: str = 'ignore'):
    305         """ Download and return data as xarray Datasets """
--> 306         ds = self.to_dataframe(errors=errors).to_xarray()
    307         ds = ds.sortby(
    308             ["TIME", "PRES"]

~\Documents\Programming\argopy\argopy\data_fetchers\argovis_data.py in to_dataframe(self, errors)
    285         else:
    286             method = self.parallel_method
--> 287         df_list = self.fs.open_mfjson(
    288             self.uri, method=method, preprocess=self.json2dataframe, progress=self.progress, errors=errors
    289         )

~\Documents\Programming\argopy\argopy\stores\filesystems.py in open_mfjson(self, urls, max_workers, method, progress, preprocess, url_follow, errors, *args, **kwargs)
    805             return results
    806         else:
--> 807             raise DataNotFound(urls)
    808 
    809 

DataNotFound: "['https://argovis.colorado.edu/selection/profiles?startDate=2020-11-11T00:00:00Z&endDate=2020-11-12T00:00:00Z&shape=%5B%5B%5B-180.0,-90.0%5D,%5B-180.0,-70.0%5D,%5B-160.0,-70.0%5D,%5B-160.0,-90.0%5D,%5B-180.0,-90.0%5D%5D%5D&presRange=%5B0.0,500.0%5D' ... ]

Example from ERDDAP

from argopy import DataFetcher as ArgoDataFetcher
argo_loader = ArgoDataFetcher(src='erddap', parallel=True, progress=True, cache=False, mode='expert', dataset='phy')
argo_loader.region([-180, 180, -90, 90, 0, 5000, '2020-11-11', '2020-11-12']).load()

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-2-17f34696626e> in <module>
----> 1 argo_loader.region([-180, 180, -90, 90, 0, 5000, '2020-11-11', '2020-11-12']).load()
      2 argo_loader

~\Documents\Programming\argopy\argopy\fetchers.py in load(self, force, **kwargs)
    549         if not self._loaded or force:
    550             # Fetch measurements:
--> 551             self._data = self.to_xarray(**kwargs)
    552             # Next 2 lines must come before ._index because to_index(full=False) calls back on .load() to read .data
    553             self._request = self.__repr__()  # Save definition of loaded data

~\Documents\Programming\argopy\argopy\fetchers.py in to_xarray(self, **kwargs)
    424                 % ",".join(self.Fetchers.keys())
    425             )
--> 426         xds = self.fetcher.to_xarray(**kwargs)
    427         xds = self.postproccessor(xds)
    428 

~\Documents\Programming\argopy\argopy\data_fetchers\erddap_data.py in to_xarray(self, errors)
    468         else:
    469             try:
--> 470                 ds = self.fs.open_mfdataset(
    471                     self.uri, method=self.parallel_method, progress=self.progress, errors=errors
    472                 )

~\Documents\Programming\argopy\argopy\stores\filesystems.py in open_mfdataset(self, urls, max_workers, method, progress, concat, concat_dim, preprocess, preprocess_opts, errors, *args, **kwargs)
    632                 return results
    633         elif len(failed) == len(urls):
--> 634             raise ValueError("Errors happened with all URLs, this could be due to an internal impossibility to read returned content.")
    635         else:
    636             raise DataNotFound(urls)

ValueError: Errors happened with all URLs, this could be due to an internal impossibility to read returned content.
gmaze commented 2 years ago

Hi @mamoonrashid I couldn't reproduce your errors

Screenshot 2022-05-16 at 08 26 34

The error about TypeError: request() got an unexpected keyword argument 'client_kwargs' make me think you may not have the appropriate fsspec version installed, could you show the output of argopy.show_versions() ?

Be careful also to monitor the status of the servers, they may be temporarily down:

Also I suggest you to use the option cache=True. In the case where a fetch wouldn't go through, a restart will be limited only to data not fetched yet, others would get loaded from cache.

mamoonrashid commented 2 years ago

@gmaze You are correct. I had an older version of fsspec installed. It has been a while since I have installed packages using conda in developer mode, and looks like I did not install correctly because none of the dependencies were downloaded/installed. At the moment, I don't know how to install packages properly (with dependencies) with conda, but I was able to do it with pip using the following command :

pip install -e <path to argopy project root directory>

As suggested, issues with 'argovis' and 'erddap' were also due to the large domain.

I usually keep cache=True. I just disabled caching to see if it would resolve the issues.

I will close the issue now. Thank you for the quick response!