eliocamp / metR

Tools for Easier Analysis of Meteorological Fields
https://eliocamp.github.io/metR/
139 stars 22 forks source link

ReadNetCDF unable to read OpenDAP dataset #164

Closed pascaloettli closed 1 year ago

pascaloettli commented 1 year ago

Hello,

The ReadNetCDF function is not able to open an OpenDAP dataset (for example here, the NCEP/DOE Reanalysis II)

library(metR)
f <- "https://psl.noaa.gov/thredds/dodsC/Datasets/ncep.reanalysis2/Monthlies/LTMs/gaussian_grid/air.2m.mon.ltm.nc"
ReadNetCDF(file = f, vars = "air")

Error in ReadNetCDF(file = f, vars = "air") : 1 assertions failed:

  • Variable 'file': File or URL not readable.

Using the {ncdf4} package:

ncdf4::nc_open(f)
File https://psl.noaa.gov/thredds/dodsC/Datasets/ncep.reanalysis2/Monthlies/LTMs/gaussian_grid/air.2m.mon.ltm.nc (NC_FORMAT_CLASSIC):

 3 variables (excluding dimension variables):
    double climatology_bounds[nbnds,time]   
        long_name: Climate Time Boundaries
        units: hours since 1800-1-1 00:00:00
    float air[lon,lat,level,time]   
        long_name: Long Term Mean Monthly Mean of Forecast of Air temperature at 2 m
        units: degK
        precision: 2
        least_significant_digit: 1
        GRIB_id: 11
        GRIB_name: TMP
        var_desc: Air temperature
        dataset: NCEP-DOE AMIP-II Reanalysis Derived Products
        level_desc: 2 m
        statistic: Long Term Mean
        ---

We can get the metadata of the file. And this syntax works:

ReadNetCDF(ncdf4::nc_open(f), vars = "air")
          time level     lat     lon      air
 1: 0000-12-30     2  88.542   0.000 246.4040
 2: 0000-12-30     2  88.542   1.875 246.5536
 3: 0000-12-30     2  88.542   3.750 246.5213
 4: 0000-12-30     2  88.542   5.625 246.5563
 5: 0000-12-30     2  88.542   7.500 246.5680
---                                          

The problem seems related to the checkURLFile function:

unname(file.access(f, 4) == 0 | RCurl::url.exists(f))
[1] FALSE
eliocamp commented 1 year ago

It seems that RCurl is not recognising it as a valid URL. If I navigate to that URL I get

Error {
    code = 400;
    message = "Unrecognized request";
};

So something might be going on. It seems that RCurl is not as robust as one hopes to detect if something is a URL.

pascaloettli commented 1 year ago

Adding .dds at the end of the filename (i.e., https://psl.noaa.gov/thredds/dodsC/Datasets/ncep.reanalysis2/Monthlies/LTMs/gaussian_grid/air.2m.mon.ltm.nc.dds) gives:

Dataset {
    Float32 level[level = 1];
    Float32 lat[lat = 94];
    Float32 lon[lon = 192];
    Float64 time[time = 12];
    Float64 climatology_bounds[time = 12][nbnds = 2];
    Grid {
     ARRAY:
        Float32 air[time = 12][level = 1][lat = 94][lon = 192];
     MAPS:
        Float64 time[time = 12];
        Float32 level[level = 1];
        Float32 lat[lat = 94];
        Float32 lon[lon = 192];
    } air;
    Grid {
     ARRAY:
        Int16 valid_yr_count[time = 12][level = 1][lat = 94][lon = 192];
     MAPS:
        Float64 time[time = 12];
        Float32 level[level = 1];
        Float32 lat[lat = 94];
        Float32 lon[lon = 192];
    } valid_yr_count;
} Datasets/ncep.reanalysis2/Monthlies/LTMs/gaussian_grid/air.2m.mon.ltm.nc;

And

f <- "https://psl.noaa.gov/thredds/dodsC/Datasets/ncep.reanalysis2/Monthlies/LTMs/gaussian_grid/air.2m.mon.ltm.nc.dds"
unname(file.access(f, 4) == 0 | RCurl::url.exists(f))
[1] TRUE

So maybe a two-step test is possible. First with the .nc extension only. If FALSE, adding .dds extension and test again. If TRUE, ncdf4::nc_open() can read the .nc file, as showed in the first post.

eliocamp commented 1 year ago

I think a better approach is to not try to do anything fancy and since I'm letting ncdf4 do all the reading, just let it tell me if it works or not. I'll just remove the checks and add some error trapping so it's clear that the error came from ReadNetCDF().