dankelley / oce

R package for oceanographic processing
http://dankelley.github.io/oce/
GNU General Public License v3.0
142 stars 42 forks source link

Errors when reading some metadata items in an Argo file #1522

Closed richardsc closed 4 years ago

richardsc commented 5 years ago

@dankelley noticed during Issue #1520 that there are some netCDF related errors in reading the file:

FYI if you think the HISTORY_* items are worth trying to get, I can try to figure out why ncvar_get() cannot read them due to a "dim" problem. When I do ncdump -h create_data/argo/6900388_prof.nc in the OS, I get stuff that looks useful. But R (not oce) has problems decoding this stuff: library(ncdf4) f <- nc_open("~/git/oce/create_data/argo/6900388_prof.nc") ncvar_get(f, "HISTORY_INSTITUTION") spits out Error in R_nc4_get_vara_text: NetCDF: Index exceeds dimension bound Var: HISTORY_INSTITUTION Ndims: 3 Start: 0,0,0 Count: 0,223,4 Error in ncvar_get_inner(ncid2use, varid2use, nc$var[[li]]$missval, addOffset, : C function R_nc4_get_var_text returned error

dankelley commented 5 years ago

This problem of reading e.g. HISTORY_INSTITUTION relates to #1118. I'm going to so some searching on the interwebs to see if there is a way to read these things.

dankelley commented 5 years ago

(Q for @richardsc at end)

I am starting to wonder whether these HISTORY_* things are simply blank. I did

ncdump ~/git/oce/create_data/argo/6900388_prof.nc  > ~/Dropbox/6900388_prof.txt

in a unix shell, and the resultant file (26310 lines) lists data for e.g. LATITUDE and so forth, but

$ grep -n HISTORY ~/Dropbox/6900388_prof.txt | tail -1
322:        HISTORY_QCTEST:_FillValue = " " ;

which is in the header listing at the top of the output. (By contrast, the final LATITUDE is in line 3404, the final PSAL is in line 24256, etc.)

So, as far as I can tell, these HISTORY_* items are simply blank. That may be why ncdf4::ncvar_get() is spewing an error message.

@richardsc do you know of another way (other than R/ncdf4 and unix/ncdump) of reading this file? If so, can you tell if the HISTORY_* items are blank?

Action: If we find that the entries are blank, I will be inclined to stop spewing a warning on these items. I don't want users seeing 12 warnings every time they read an argo file -- it might make them wonder if they are doing something wrong, or if the file is broken. (Well, maybe this file is broken, listing things in the header that are not filled with data.)

richardsc commented 5 years ago

Good sleuthing. I'll look into other ways to read it, but I like the suggestion of just suppressing the warnings for now.

dankelley commented 5 years ago

I looked with panoply (https://www.giss.nasa.gov/tools/panoply/download/). I clicked on the HISTORY_* items and selected the 'export' operation, but got only a header. I did get content for othr things, like PSAL. This is a second indication that there are no data in the HISTORY_* items.

dankelley commented 5 years ago

Here's another indication:

$ ncdump -h /data/glider/delayedData/SEA019/Data/M54/testSOCIB/netcdf/GLI2018_SEA019_054DM_L1.nc | grep UNLIMITED
    time = UNLIMITED ; // (528878 currently)
$ ncdump -h ~/git/oce/create_data/argo/6900388_prof.nc | grep UNLIMITED
    N_HISTORY = UNLIMITED ; // (0 currently)

so, now, I think what I'm going to do is to try to find out how to find the "currently" length with a ncdf4 call.

dankelley commented 5 years ago

I see

> nc <- nc_open("~/git/oce/create_data/argo/6900388_prof.nc")
> nc$dim$N_HISTORY$len
[1] 0

which I think may be the clue I need. However, I am a bit reluctant to dig into aspects of the data structure defined by ncdf4, because that is probably subject to change.

dankelley commented 5 years ago

Another clue:

> dput(nc$var$HISTORY_INSTITUTION)
structure(list(id = structure(list(id = 52, group_index = -1, 
    group_id = 131072L, list_index = 53, isdimvar = FALSE), class = "ncid4"), 
    name = "HISTORY_INSTITUTION", ndims = 3L, natts = 3L, size = c(4L, 
    223L, 0L), dimids = c(6L, 8L, 12L), prec = "char", units = "", 
    longname = "Institution which performed action", group_index = 1L, 
    chunksizes = NA, storage = 2, shuffle = FALSE, compression = NA, 
    dims = list(), dim = list(structure(list(name = "STRING4", 
        len = 4L, unlim = FALSE, group_index = 1L, group_id = 131072L, 
        id = 6L, dimvarid = structure(list(id = -1L, group_index = 1L, 
            group_id = 131072L, list_index = -1, isdimvar = TRUE), class = "ncid4"), 
        vals = 1:4, units = "", create_dimvar = FALSE), class = "ncdim4"), 
        structure(list(name = "N_PROF", len = 223L, unlim = FALSE, 
            group_index = 1L, group_id = 131072L, id = 8L, dimvarid = structure(list(
                id = -1L, group_index = 1L, group_id = 131072L, 
                list_index = -1, isdimvar = TRUE), class = "ncid4"), 
            vals = 1:223, units = "", create_dimvar = FALSE), class = "ncdim4"), 
        structure(list(name = "N_HISTORY", len = 0L, unlim = TRUE, 
            group_index = 1L, group_id = 131072L, id = 12L, dimvarid = structure(list(
                id = -1L, group_index = 1L, group_id = 131072L, 
                list_index = -1, isdimvar = TRUE), class = "ncid4"), 
            vals = 1:0, units = "", create_dimvar = FALSE), class = "ncdim4")), 
    varsize = c(4L, 223L, 0L), unlim = TRUE, make_missing_value = TRUE, 
    missval = " ", hasAddOffset = FALSE, hasScaleFact = FALSE), class = "ncvar4")
dankelley commented 5 years ago

Here are the three dimensions

> nc$var$HISTORY_INSTITUTION$ndim
[1] 3

so it's a 3D thing. Now, the first dim is

> nc$var$HISTORY_INSTITUTION$dim[[1]]$len
[1] 4

which I think means that it has 4 characters. NOTE: you can look at

str(nc$var$HISTORY_INSTITUTION$dim[[1]])

to find that the name is "STRING4". There are other things there but I'll focus on name and len and go through them all:

> nc$var$HISTORY_INSTITUTION$dim[[1]][c("name","len")]
$name
[1] "STRING4"

$len
[1] 4

> nc$var$HISTORY_INSTITUTION$dim[[2]][c("name","len")]
$name
[1] "N_PROF"

$len
[1] 223

> nc$var$HISTORY_INSTITUTION$dim[[3]][c("name","len")]
$name
[1] "N_HISTORY"

$len
[1] 0

So, what I am getting from this is that there are indeed no items in HISTORY_INSTITUTION.

Let's try another:

> for (i in 1:3) print(nc$var$HISTORY_DATE$dim[[i]][c("name","len")])
$name
[1] "DATE_TIME"

$len
[1] 14

$name
[1] "N_PROF"

$len
[1] 223

$name
[1] "N_HISTORY"

$len
[1] 0

And one more

> for (i in 1:3) print(nc$var$HISTORY_ACTION$dim[[i]][c("name","len")])
$name
[1] "STRING4"

$len
[1] 4

$name
[1] "N_PROF"

$len
[1] 223

$name
[1] "N_HISTORY"

$len
[1] 0
dankelley commented 5 years ago

I know that anyone following this is going nuts with the comments, but I want this to be a sort of diary of my investigation, so it has been step-by-step.

What I am going to do is to write code along the lines of what I have just above, and if I see that N_HISTORY is of zero length, then I will not spew the warning about a problem reading. I have not seen this problem come up with other data so I will apply this extra testing only to the HISTORY_* items.

Coding will be start in about half an hour, after some errands.

dankelley commented 5 years ago

commit 27bc3567deef36d1e53be5d2263fe332e3d4cb30 Author: dankelley kelley.dan@gmail.com Date: Sat Apr 13 16:02:09 2019 -0300

Remove warning on reading HISTORY_ items

This should address issue #1522 but it is brittle, being based on the layout
of an object created by another package (ncdf4).
richardsc commented 5 years ago

I've been using this for a couple of days, and it seems to work.

I'm a bit concerned about why there would be length=0 items in the netCDF file, but that's probably a question for the Argo data management team.

dankelley commented 4 years ago

I am reopening this based on our vf2f meeting today.

dankelley commented 4 years ago

I'm closing this again, because I now see that #1703 is a separate thing (also, the file I mention there has the HISTORY_ things non-blank).