eurec4a / eurec4a-intake

Intake catalogue for EUREC4A field campaign datasets
17 stars 19 forks source link

Updating P-3 entries to point to permanent NOAA archive #50

Closed RobertPincus closed 3 years ago

RobertPincus commented 3 years ago

@d70-t Every one of the new entries is failing but I can't figure out why. Most seem to be failing with a Unicode error but the formatting looks identical to the HALO entries to me.

I might add that the catalog entries work fine when I access them locally - I'm able to plot the data etc.

d70-t commented 3 years ago

I suspect that this has something to do with the encoding the server returns...

old server:

$ curl -v "https://psl.noaa.gov/thredds/dodsC/Datasets/ATOMIC/data/p3/WSRA/EUREC4A_ATOMIC_P3_WSRA_20200117_v1.0.nc.das"
...
< content-type: text/plain; charset=UTF-8
...

new server:

$ curl -v "https://www.ncei.noaa.gov/thredds-ocean/dodsC/ncei/archive/data/0220627/EUREC4A_ATOMIC_P3_WSRA_20200117_v1.0.nc.das"
...
< Content-Type: text/plain
...

But I could not yet verify what exactly causes this trouble.

I don't know to wich extent this is adhered to in practice, but according to the specification, OPeNDAP strings are all ASCII: OPeNDAP String Types

d70-t commented 3 years ago

This is related to #38 and this comment. The reason is that the new server actually returns the responses in a compressed format. pydap/pydap#152 does fix this (i.e. add a call to response.decode_content()). But while this has been merged long ago, there has never been an official release which includes this fix.

To get around this issue, we either need to require a non-released version of pydap or need to force the use of the netcdf4 engine for these datasets as in #38.

RobertPincus commented 3 years ago

@d70-t Thanks very much for figuring out what was wrong. I went with explicitly specifying the use of the netcdf engine for datasets served by NOAA NCEI in compressed format.