Closed RobertPincus closed 4 years ago
I think this commit in intake-xarray could break your files. It is supposed to speed up the access to remotely stored netcdf files by downloading and caching them, but that doesn't work this way for opendap resources.
Which version of intake-xarray
are you using on your system?
We are using the most recent master of intake-xarray as earlier versions only allowed to access opendap when authenticated via ESGF, but not by other means. But it seems like the current version also broke the possibility of accessing opendap via the netcdf
driver. Personally, I didn't realise that it was possible to access OPeNDAP via the netcdf
driver, but it is documented and kind of makes a lot of sense. On the other hand, I see that caching files is also a desireable feature of intake-xarray. And I think it is not possible to guess if a dataset should be accessed by downloading a netcdf file or via opendap, only by looking at the URL.
Looking at this a bit more I think the issue is something to do with accessing local files... https://github.com/eurec4a/eurec4a-intake/runs/977998243#step:5:251
This would need a bit more digging, but if it's working locally for you then it should work on the CI platform. It probably is a package inconsistency somewhere...
@d70-t @leifdenby I manage my local environment within conda:
intake 0.6.0 py_0
intake-xarray 0.3.1 py_0
If we are too bleeding-edge with the catalog or the CI tests we run the risk of making the catalog less useful. Aeris no longer requires authentication - might we revert to the current distribution of intake-xarray
? Or is the issue elsewhere?
thanks @RobertPincus. I've been looking into to this a bit more. It appears that the netcdf
intake "driver" provided by intake-xarray
should work for fetching netCDF data via OPeNDAP, if that OPeNDAP server doesn't require authentication (list of intake plugins and the drivers they provide). I didn't realise this - it's pretty confusing that netcdf
driver is for local netCDF files and for OPeNDAP without authentication, and opendap
is for when an OPeNDAP server requires authentication... Anyway, that's why I made a pull-request for changes to intake-xarray
to make it possible to provide authentication information (otherwise intake-xarray
only works with the ESGF and URS servers which are defined in pydap
).
The above means we don't need to use my fork of intake-xarray
, but we should replace all use of the opendap
driver with netcdf
when accessing data from non-authenticated OPeNDAP backend. Unfortunately @d70-t, this doesn't appear to work with your server that is serving the SPECMACS data at the moment. You check this with these examples:
>> import xarray as xr
>> ds = xr.open_dataset('http://macsserver.physik.uni-muenchen.de/products/dap/eurec4a/cloudmask/EUREC4A_HALO_specMACS_cloud_mask_20200119T092900-20200119T184859_v1.1.nc')
...
OSError: [Errno -68] NetCDF: I/O failure: b'http://macsserver.physik.uni-muenchen.de/products/dap/eurec4a/cloudmask/EUREC4A_HALO_specMACS_cloud_mask_20200119T092900-20200119T184859_v1.1.nc'
# try with example from xarray docs
>> ds = xr.open_dataset("http://iridl.ldeo.columbia.edu/SOURCES/.OSU/.PRISM/.monthly/dods", decode_times=False)
>> len(ds)
4
Also @RobertPincus I'm getting an error 400 for the URLs mentioned in this pull-request (e.g. http://psl.noaa.gov/thredds/dodsC/Datasets/ATOMIC/data/p3/AXBT/Level_3/P3_AXBT_Level_3_v0.7.nc), maybe you moved the files since making this pull-request?
I suggest that
1) @d70-t you get your OPeNDAP server working with xarray, and then we
2) remove my fork and use the netcdf
driver for non-authenticated OPeNDAP access (which will be all data sources now that AERIS doesn't require authentication) and put this on master (I can do this once you give me the go-ahead @d70-t ),
3) @RobertPincus if you could check that the URLs you're provide are still valid and then you could merge in master
from this repo once we've removed all use of the opendap
driver
4) we merge your pull-request @RobertPincus and get the P3 files into the catalog.
Sound ok?
@d70-t I started a branch to replace use of the opendap
driver with netcdf
here: https://github.com/leifdenby/eurec4a-intake/commit/f6480443ca366b9ef7c3ca56eef4434723b7a4a2
@leifdenby Thanks this sounds terrific. Many URLs are likely to change today and tomorrow so this timing is perfect. Hoping @d70-t can get the SpecMACS server working well.
One thing which still bothers me is that I think that the've broken the ability to open OPeNDAP resources with the netcdf
driver of intake-xarray
with the commit mentioned above, as that uses fsspec
to cache the files, which does not seem to be compatible with OPeNDAP. My fear is that while we should access the datasets using netcdf
in the current version of intake-xarray
, the upcoming version might introduce a breaking change which would force us to use opendap
in stead.
I'll have a look at my server anyways.
@d70-t I worry about this too. It seems to me that one should be able to use either OpenDAP
or netcdf
plugins with remote files. I moved to using netcdf
because the opendap
plugin was failing because I hadn't set ESGF authentication credentials.
@leifdenby while you should use https on the macsserver, I still can not reproduce the error from above on my machine:
In [1]: import xarray as xr
...: ds = xr.open_dataset('https://macsserver.physik.uni-muenchen.de/products/dap/eurec4a/cloudmask/EUREC4A_HALO_specMACS_cloud_mask_20200119T092900-20200119T184859_v1.1.nc')
In [2]: len(ds)
Out[2]: 9
I've filed an issue in intake-xarray
. Let's see if the've got a suggestion.
@leifdenby while you should use https on the macsserver, I still can not reproduce the error from above on my machine:
Thanks for checking this. I think you're right. I've found out that opening the dataset directly with xarray using the URL to the opendap server works on my laptop (linux), but not on my workstation (macos). I've been trying to dig into this, but I think for now we should move ahead with switching to using the netcdf
driver.
@leifdenby Thanks this sounds terrific. Many URLs are likely to change today and tomorrow so this timing is perfect. Hoping @d70-t can get the SpecMACS server working well.
Yes, I was hoping we could get this fixed in time for the ATOMIC hackathon. @d70-t should I make a pull-request from the branch I made changing the driver to netcdf
? Or will you make a pull-request we can merge?
This is a bit of a troublesome situation. Yes, it would be awesome to have this sorted out for the ATOMIC hackathon, but I am a bit hesitant of pushing the catalog with netcdf
drivers into master
as long as we don't know if that behaviour will stay available for longer.
From my perspective, the pro arguments for changing to netcdf
are:
intake-xarray
The counter arguments against it are:
master
and (my guess) the upcoming releases of intake-xarray
netcdf
will break the currently working scripts from the EUREC4A hackathon, which use the current master
of intake-xarray
netcdf
based catalogs among the ATOMIC hackathon people may lead to installations which will break in the near future if intake-xarray
decides to keep the current state of their master
. Either by us changing to opendap
again or by the users updating to a newer version of intake-xarray
.I think I would prefer to make this problematic situation explicit to the user scripts and create a separate branch which contains references to the netcdf
based catalogs. In case of us updating the opendap
catalogs, the old scripts will remain functional and in case of the users updating intake-xarray
, that could remind them of the situation such that it may be easier to fix this properly as soon as a more permanent solution is available.
@d70-t @leifdenby Very few people from the ATOMIC hackathon will use the intake catalog so don't feel pressure to merge on that account.
I've diverted this PR to the netcdf_driver
branch (which is based on Leif's changes) for now. I assume that github will pick this up later and will run the tests on the P3 datasets in an environment which provides a fair chance of succeeding.
I've another comment: the PSL server is redirecting all HTTP requests to HTTPS, so I'd suggest to directly use https://
for all URLs in stead.
@leifdenby @d70-t Wow, now the CI is in trouble from the very start. I don't know how to interpret this, I'm sorry.
@RobertPincus this is actually not to bad... first it tells you that there is only one failing test. This is because it can't find a sub catalog. That is the WSRA, which is still spelled inconsistently as mentioned above.
I should have started with a branch instead of committing to master directly, my apologies.
The CI tests appear to be failing during item.discover(). This works on my local installation but the CI is complaining about a missing module "aiohttp". I may try to add this next.