Closed richardsc closed 4 years ago
This problem of reading e.g. HISTORY_INSTITUTION
relates to #1118. I'm going to so some searching on the interwebs to see if there is a way to read these things.
(Q for @richardsc at end)
I am starting to wonder whether these HISTORY_*
things are simply blank. I did
ncdump ~/git/oce/create_data/argo/6900388_prof.nc > ~/Dropbox/6900388_prof.txt
in a unix shell, and the resultant file (26310 lines) lists data for e.g. LATITUDE
and so forth, but
$ grep -n HISTORY ~/Dropbox/6900388_prof.txt | tail -1
322: HISTORY_QCTEST:_FillValue = " " ;
which is in the header listing at the top of the output. (By contrast, the final LATITUDE
is in line 3404, the final PSAL
is in line 24256, etc.)
So, as far as I can tell, these HISTORY_*
items are simply blank. That may be why ncdf4::ncvar_get()
is spewing an error message.
@richardsc do you know of another way (other than R/ncdf4 and unix/ncdump) of reading this file? If so, can you tell if the HISTORY_*
items are blank?
Action: If we find that the entries are blank, I will be inclined to stop spewing a warning on these items. I don't want users seeing 12 warnings every time they read an argo file -- it might make them wonder if they are doing something wrong, or if the file is broken. (Well, maybe this file is broken, listing things in the header that are not filled with data.)
Good sleuthing. I'll look into other ways to read it, but I like the suggestion of just suppressing the warnings for now.
I looked with panoply (https://www.giss.nasa.gov/tools/panoply/download/). I clicked on the HISTORY_*
items and selected the 'export' operation, but got only a header. I did get content for othr things, like PSAL
. This is a second indication that there are no data in the HISTORY_*
items.
Here's another indication:
$ ncdump -h /data/glider/delayedData/SEA019/Data/M54/testSOCIB/netcdf/GLI2018_SEA019_054DM_L1.nc | grep UNLIMITED
time = UNLIMITED ; // (528878 currently)
$ ncdump -h ~/git/oce/create_data/argo/6900388_prof.nc | grep UNLIMITED
N_HISTORY = UNLIMITED ; // (0 currently)
so, now, I think what I'm going to do is to try to find out how to find the "currently" length with a ncdf4
call.
I see
> nc <- nc_open("~/git/oce/create_data/argo/6900388_prof.nc")
> nc$dim$N_HISTORY$len
[1] 0
which I think may be the clue I need. However, I am a bit reluctant to dig into aspects of the data structure defined by ncdf4
, because that is probably subject to change.
Another clue:
> dput(nc$var$HISTORY_INSTITUTION)
structure(list(id = structure(list(id = 52, group_index = -1,
group_id = 131072L, list_index = 53, isdimvar = FALSE), class = "ncid4"),
name = "HISTORY_INSTITUTION", ndims = 3L, natts = 3L, size = c(4L,
223L, 0L), dimids = c(6L, 8L, 12L), prec = "char", units = "",
longname = "Institution which performed action", group_index = 1L,
chunksizes = NA, storage = 2, shuffle = FALSE, compression = NA,
dims = list(), dim = list(structure(list(name = "STRING4",
len = 4L, unlim = FALSE, group_index = 1L, group_id = 131072L,
id = 6L, dimvarid = structure(list(id = -1L, group_index = 1L,
group_id = 131072L, list_index = -1, isdimvar = TRUE), class = "ncid4"),
vals = 1:4, units = "", create_dimvar = FALSE), class = "ncdim4"),
structure(list(name = "N_PROF", len = 223L, unlim = FALSE,
group_index = 1L, group_id = 131072L, id = 8L, dimvarid = structure(list(
id = -1L, group_index = 1L, group_id = 131072L,
list_index = -1, isdimvar = TRUE), class = "ncid4"),
vals = 1:223, units = "", create_dimvar = FALSE), class = "ncdim4"),
structure(list(name = "N_HISTORY", len = 0L, unlim = TRUE,
group_index = 1L, group_id = 131072L, id = 12L, dimvarid = structure(list(
id = -1L, group_index = 1L, group_id = 131072L,
list_index = -1, isdimvar = TRUE), class = "ncid4"),
vals = 1:0, units = "", create_dimvar = FALSE), class = "ncdim4")),
varsize = c(4L, 223L, 0L), unlim = TRUE, make_missing_value = TRUE,
missval = " ", hasAddOffset = FALSE, hasScaleFact = FALSE), class = "ncvar4")
Here are the three dimensions
> nc$var$HISTORY_INSTITUTION$ndim
[1] 3
so it's a 3D thing. Now, the first dim is
> nc$var$HISTORY_INSTITUTION$dim[[1]]$len
[1] 4
which I think means that it has 4 characters. NOTE: you can look at
str(nc$var$HISTORY_INSTITUTION$dim[[1]])
to find that the name
is "STRING4"
. There are other things there but I'll focus on name
and len
and go through them all:
> nc$var$HISTORY_INSTITUTION$dim[[1]][c("name","len")]
$name
[1] "STRING4"
$len
[1] 4
> nc$var$HISTORY_INSTITUTION$dim[[2]][c("name","len")]
$name
[1] "N_PROF"
$len
[1] 223
> nc$var$HISTORY_INSTITUTION$dim[[3]][c("name","len")]
$name
[1] "N_HISTORY"
$len
[1] 0
So, what I am getting from this is that there are indeed no items in HISTORY_INSTITUTION
.
Let's try another:
> for (i in 1:3) print(nc$var$HISTORY_DATE$dim[[i]][c("name","len")])
$name
[1] "DATE_TIME"
$len
[1] 14
$name
[1] "N_PROF"
$len
[1] 223
$name
[1] "N_HISTORY"
$len
[1] 0
And one more
> for (i in 1:3) print(nc$var$HISTORY_ACTION$dim[[i]][c("name","len")])
$name
[1] "STRING4"
$len
[1] 4
$name
[1] "N_PROF"
$len
[1] 223
$name
[1] "N_HISTORY"
$len
[1] 0
I know that anyone following this is going nuts with the comments, but I want this to be a sort of diary of my investigation, so it has been step-by-step.
What I am going to do is to write code along the lines of what I have just above, and if I see that N_HISTORY
is of zero length, then I will not spew the warning about a problem reading. I have not seen this problem come up with other data so I will apply this extra testing only to the HISTORY_*
items.
Coding will be start in about half an hour, after some errands.
commit 27bc3567deef36d1e53be5d2263fe332e3d4cb30 Author: dankelley kelley.dan@gmail.com Date: Sat Apr 13 16:02:09 2019 -0300
Remove warning on reading HISTORY_ items
This should address issue #1522 but it is brittle, being based on the layout
of an object created by another package (ncdf4).
I've been using this for a couple of days, and it seems to work.
I'm a bit concerned about why there would be length=0
items in the netCDF file, but that's probably a question for the Argo data management team.
I am reopening this based on our vf2f meeting today.
I'm closing this again, because I now see that #1703 is a separate thing (also, the file I mention there has the HISTORY_
things non-blank).
@dankelley noticed during Issue #1520 that there are some netCDF related errors in reading the file: