dankelley / oce

R package for oceanographic processing
http://dankelley.github.io/oce/
GNU General Public License v3.0
145 stars 42 forks source link

read.odf warnings(): null values and GMT #1426

Closed EOGrady21 closed 6 years ago

EOGrady21 commented 6 years ago

Short summary of problem

When reading in archived ODFs from BIO database I am consistently getting a warning about Null values and sometimes accompanied by warnings about GMT.

Details (optional)

The warning I get looks like :

48: In read.odf(files[f]) : using first of 2 unique NULL_VALUEs
49: In ODFNames2oceNames(ODFnames, ODFunits, PARAMETER_HEADER = NULL,  ... :
  unable to interpret ODFunits[7]='GMT', for item named 'time'

@richardsc This is from the odf2adp script

What you expected to happen

Not sure if this is an 'issue' exactly but I am curious about the null value warnings. I understand the GMT warning is likely just due to the function looking for 'UTC' and not understanding GMT but that is not really a concern since they are (from my understanding) the same.

How urgent is this?

This is not urgent at all.

Output from sessionInfo()

R version 3.4.4 (2018-03-15) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] abind_1.4-5 ncdf4_1.16 oce_0.9-24 gsw_1.0-5 testthat_2.0.0

loaded via a namespace (and not attached): [1] compiler_3.4.4 magrittr_1.5 R6_2.2.2 tools_3.4.4 yaml_2.1.19 Rcpp_0.12.16 rlang_0.2.0

dankelley commented 6 years ago

Can you email me an ODF file that causes a problem? I can look this afternoon.

dankelley commented 6 years ago

Thanks for the file. Q: do you get the full msg as below? (Filename deleted for privacy.) I want to know whether this second problem is something to do with my setup.

> library(oce)
Loading required package: gsw
Loading required package: testthat
> d <- read.oce("REDACTED.ODF")
Warning messages:
1: In ODFNames2oceNames(ODFnames, ODFunits, PARAMETER_HEADER = NULL,  :
  unable to interpret ODFunits[7]='GMT', for item named 'time'
2: In read.odf(file, debug = debug) : using first of 2 unique NULL_VALUEs
richardsc commented 6 years ago

I could be wrong, but I think the second item (about the 2 unique NULLs) is what prompted @Echisholm21 to open the issue.

dankelley commented 6 years ago

Thanks. In "develop" commit 664fe47e2d6974906e86a851ac1ff876b6453733 the first warning won't occur anymore. (I was only permitting one time "unit" because that was the only unit I had noticed in the BIO-based document named BIO_ODF_Parameter_Code_List.pdf on ODF format.)

The second part, about two NULL values, will require some more fiddling with the code. It stems from the oce limitation of only one "missing value" per data file. That is violated in both ODF and Netcdf file formats, so it's something I ought to fix. I may have a look at that sometime before the weekend.

PS to @Echisholm21 in case it's of interest, a great way to find a problem in oce warnings is to do use options(warn=3) before running the command that gives the warning. That will change the warning into an error. Then, you can do traceback() to see where the problem lies in the R code. (Possibly you have to build the package locally to get line numbers, but that would be the case if you were trying to fix a problem in a rush.)

EOGrady21 commented 6 years ago

Thanks Dan! That's a good tip for poking around in the future. The null values warning was occurring on my end too and @richardsc is right it was the more concerning one to me.

I'm glad the GMT was an easy fix, let me know what happens with the null values, I'd be interested to see what's going on there.

dankelley commented 6 years ago

I feel that this is working now in "develop". Below is what I get for your test file (see near the bottom). This is not actually a deep test, because your data file has no bad 'time' values, etc. I think the only way to really test this is to mess with a file in the editor, making up some individualized NULL values for individual data types, and then altering some of the data to be of those values. I think that's worth doing, actually, and I could do it in private build tests quite easily. Heck, I could even do it by direct manipulation of the file oce/inst/extdata/CTD_BCD2014666_008_1_DN.ODF in the source. I may just do that, tomorrow sometime, and therefore I will leave this issue open for a few days.

> library(oce)
> d<-read.oce("REDACTED.ODF",debug=3)
  read.oce("REDACTED.ODF", ...) inferred type="madcp/odf"
  read.odf("REDACTED.ODF", ...) {
    ODFNames2oceNames() {
      STAGE 1 names:  EWCT_01 NSCT_01 VCSP_01 ERRV_01 BEAM_01 UNKN_01 SYTM_01 
      STAGE 2 names:  u v w error a unknown time 
      STAGE 3 names:  u v w error a unknown time 
      STAGE 4 names:  u v w error a unknown time 
    ODFnames[1]='EWCT_01', names[1]='u', ODFunits[1]='M/s'
    ODFnames[2]='NSCT_01', names[2]='v', ODFunits[2]='M/s'
    ODFnames[3]='VCSP_01', names[3]='w', ODFunits[3]='M/s'
    ODFnames[4]='ERRV_01', names[4]='error', ODFunits[4]='M/s'
    ODFnames[5]='BEAM_01', names[5]='a', ODFunits[5]='counts'
    ODFnames[6]='UNKN_01', names[6]='unknown', ODFunits[6]='counts'
    ODFnames[7]='SYTM_01', names[7]='time', ODFunits[7]='GMT'
    } # ODFNames2oceNames()
    oce names: u v w error a unknown time 
    NAvalueList= c(u = -1e+06, v = -1e+06, w = -1e+06, error = -1e+06, a = -1e+06, unknown = -1e+06, time = -99) 
    set 3 values in 'u' to NA, because they matched the NULL_VALUE (-1e+06)
    set 3 values in 'v' to NA, because they matched the NULL_VALUE (-1e+06)
    set 3 values in 'w' to NA, because they matched the NULL_VALUE (-1e+06)
    set 2 values in 'error' to NA, because they matched the NULL_VALUE (-1e+06)
    set 0 values in 'a' to NA, because they matched the NULL_VALUE (-1e+06)
    set 0 values in 'unknown' to NA, because they matched the NULL_VALUE (-1e+06)
    set 0 values in 'time' to NA, because they matched the NULL_VALUE (-99)
    } # read.odf()
dankelley commented 6 years ago

PS. (note to self) -- I've not yet done this on odf2oce, but rather only in read.odf().

richardsc commented 6 years ago

Good idea on changing some NULL values in the built-in ODF file to do tests -- I was thinking the same thing.

It is weird that there is a different NULL value for just one of the fields, but since they are setup like that I think it makes sense to have the code be robust in case we encounter it in other files.

FTR, for the time unit warning, I've never seen the unit field used for timezone before. I suppose it makes sense (because there likely isn't another TZ parameter), but strictly speaking GMT is not a "unit" ...

dankelley commented 6 years ago

@Echisholm21 I think things are working now in the "develop" branch commit d682d2ca410fea52e83fef552ac140127696a88a. This is quite thoroughly tested in tests/testthat/test_odf.R, which deals with a modified ODF file. (I won't go into a lot of detail here because the code speaks for itself.) Also, I compressed that file and the original version. Please have a look, and make a decision on closing the issue.

@richardsc I'm calling it a "unit" just because the ODF file names it as such. I don't think this will matter in any situation, but I could of course remove the unit if desired. Just let me know, by email or a new issue ... obviously it would just take a few seconds to do (and then 20 minutes for the automatic build/test sequence I like to undertake before committing to GH).

richardsc commented 6 years ago

I agree with you calling it a "unit" based on what's in the file. My point was more that I don't think BIO should be specifying the unit for a "time" field as the timezone. Not related to oce -- so build, test, and push away!

EOGrady21 commented 6 years ago

Looks great! Thanks so much!