dankelley / oce

R package for oceanographic processing
http://dankelley.github.io/oce/
GNU General Public License v3.0
143 stars 42 forks source link

read.netcdf() misses quite a lot of units #2233

Closed dankelley closed 2 months ago

dankelley commented 2 months ago

I'm looking at a file from cioosatlantic.ca/erddap and read.netcdf() gets the unit for pressure, but nothing else. But when I look in the netcdf file, I see that units are listed. I am assuming that the units are being written differently than what is expected by read.netcdf(). Since I think reading netcdf files is an important task (perhaps becoming more important over time), I'll look into this.

CHECKLIST (for one particular file):

PS. in the code and output below, you'll see some untranslated variable names. That's because I've not seen such names before. They are not the common names that come up in SBE datasets (even though this is a CTD dataset I'm working with here). But decoding those names is a separate issue from the present one (which I'll make in a moment).

```R R version 4.4.1 (2024-06-14) -- "Race for Your Life" Copyright (C) 2024 The R Foundation for Statistical Computing Platform: x86_64-apple-darwin20 R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. > library(oce) > # https://cioosatlantic.ca/erddap/files/bio_maritimes_region_ecosystem_survey_ctd/Maritimes%20Region%20Ecosystem%20Survey%20Summer/2023/CTD_CAR2023011_242_497961_UP.ODF.nc > f <- "CTD_CAR2023011_242_497961_UP.ODF.nc" > d <- read.netcdf(f) > summary(d) * Data Overview Min. Mean Max. Dim. NAs ScanNumber 3967 5114.3 6614 297 0 QCNTR_01 1 1 1 297 0 PRESPR01 [dbar] 0.5 74.5 148.5 297 0 QPRES_01 0 0 0 297 0 TEMPS901 2.9981 7.2321 19.442 297 0 TEMPP901 2.9981 7.2321 19.442 297 0 TEMPPR01 2.9981 7.2321 19.442 297 0 QTEMP_01 1 1 1 297 0 CNDCST01 2.9469 3.3464 4.2256 297 0 QCNDC_01 1 1 1 297 0 OXYOCPVL01 1.8417 2.2822 2.98 297 0 QOXYV_01 1 1 1 297 0 IRRDUV01 0 0.014305 0.63693 297 0 QPSAR_01 1 1 1 297 0 CPHLPR01 0.5206 2.0574 9.3864 297 1 QCPHLPR01 1 1.0269 9 297 0 AHSFZZ01 2.78 57.749 102.47 297 0 QALTB_01 1 1.0404 4 297 0 preciseLat 44.27 44.271 44.272 297 0 QLATD_01 0 0 0 297 0 preciseLon -63.319 -63.318 -63.318 297 0 QLOND_01 0 0 0 297 0 PSALST01 30.773 32.723 34.2 297 0 PSLTZZ01 30.773 32.723 34.2 297 0 QPSAL_01 1 1 1 297 0 POTMCV01 2.9953 7.2259 19.44 297 0 QPOTM_01 1 1 1 297 0 SIGTEQ01 21.71 25.48 26.577 297 0 QSIGP_01 1 1 1 297 0 DOXYZZ01 4.2994 5.9365 7.524 297 0 QDOXY_01 1 1 1 297 0 RecPerBin 2 5.5522 18 297 0 QCNTR_02 1 1 1 297 0 QCFF_01 0 0.16162 12 297 0 latitude 44.269 44.269 44.269 1 0 longitude -63.319 -63.319 -63.319 1 0 TemperatureSensor NA NA NA 1 0 ConductivitySensor NA NA NA 1 0 PressureSensor NA NA NA 1 0 OxygenSensor NA NA NA 1 0 PAR_BiosphericalLicorChelseaSensor NA NA NA 1 0 FluoroWetlabECO_AFL_FL_Sensor NA NA NA 1 0 AltimeterSensor NA NA NA 1 0 * Processing Log - 2024-08-22 11:11:40.492 UTC: `Create oce object` - 2024-08-22 11:11:40.527 UTC: `read.netcdf("CTD_CAR2023011_242_497961_UP.ODF.nc")` > ```
dankelley commented 2 months ago

I will take a look at this now. Right off the bat, I see a typo in one of the units: ml/L, which uses lower-case in the numerator and upper-case in the denominator. But the code seems not to be picking up many units at all, so finding the reason for that is my first priority.

dankelley commented 2 months ago

The problem is that as.unit() does not recognize the units in this NetCDF file. Its docs say it recognizes as below (Roxygen format).

And, for example, this NetCDF file lists a temperature unit as "degrees Celsius".

Question for @richardsc or @clayton33: do you have any insights on the units in these files, or who I ought to contact to get a list? Obviously, I can make some guesses, but if these (CIOOS-provided) data are provisional, it would make sense to wait until things are finalized. Also, if I had a contact name I could ask whether the choice to write "ml/L" is final (in which case oce should decode it) or not.

PS. I can, of course, make as.unit() return the string if it doesn't recognize it. That way, users would at least see it. But it would not be a proper unit, in the sense of oce -- for example, temperature units in oce get not just an expression for the symbols to get deg-C, but also the scale (the 1968 convention, the 1990 convention, etc) and that scale can be used in internal conversions within oce.

#' @param u A character string indicating a unit. Case is ignored, so that e.g.
#' `"dbar"` and `"DBAR"` yield equal results.  The following are recognized:
#' c(`"m-1"`, `"dbar"`, `"decibar"`, `"degree"`, `"degree_Celcius"`,
#' `"degree_north"`, `"degree_east"`, `"ipts-68"`,
#' `"its-90"`, `"m/s^1"`, `"m/s^2"`, `"pss-78"`,
#' `"umol/kg"`, `"micromole/kg"`)
dankelley commented 2 months ago

I've made read.netcdf() retain the unknown unit (as a string ... so not a real unit in the oce sense) and so now I see e.g. this snippet in the summary.

This is not yet pushed to GH. I don't do that until I rebuild and retest (so, not for half an hour).

Still, as an amusing point, these three items really seem like they mean the same thing. If you look at #2232 you can see code that indicates that there are only very small differences in the metadata of these entries. I have no idea what is going on, and worry that these CIOOS files are still in a provisional form, which would mean that there is no sense in altering read.netcdf() to try to decode the units as stated in the files. Comments, @clayton33 or @richardsc?

    TEMPS901 [° Celsius]                                          2.9981    7.2321    19.442  297   0
    TEMPP901 [° Celsius]                                          2.9981    7.2321    19.442  297   0
    TEMPPR01 [° Celsius]                                          2.9981    7.2321    19.442  297   0
clayton33 commented 2 months ago

For each variable, the units are probably pulled from the associated NERC units vocabulary, http://vocab.nerc.ac.uk/collection/P06/current/ (see my comment on issue #2234), each variable has related unit.

dankelley commented 2 months ago

I'm making some progress on units. Below is what I get with a sample file. (This code is not pushed to GH, though.)


R version 4.4.1 (2024-06-14) -- "Race for Your Life"
Copyright (C) 2024 The R Foundation for Statistical Computing
Platform: x86_64-apple-darwin20

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

[Previously saved workspace restored]

> library(oce)
Loading required package: gsw
> source("~/git/oce/R/AllClass.R");source("~/git/oce/R/netcdf.R");source("~/git/oce/R/units.R")
> # https://cioosatlantic.ca/erddag/files/bio_maritimes_region_ecosystem_survey_ctd/Maritimes%20Region%20Ecosystem%20Survey%20Summer/2023/CTD_CAR2023011_242_497961_UP.ODF.nc
> f <- "~/CTD_CAR2023011_242_497961_UP.ODF.nc"
> options(warn = 1) # so we see warnings as they occur
> d <- read.netcdf(f)
> summary(d)
* Time: 2023-08-14 04:31:09
* Data Overview

                                       Min.                Mean                    Max.                Dim. NAs
    measurement_time                   2023-08-14 05:04:11 2023-08-14 05:13:45.144 2023-08-14 05:26:15 297  0  
    ScanNumber                         3967                5114.283                6614                297  0  
    QCNTR_01                           1                   1                       1                   297  0  
    PRESPR01 [dbar]                    0.5                 74.5                    148.5               297  0  
    QPRES_01                           0                   0                       0                   297  0  
    TEMPS901 [°C, ITS-90]              2.9981              7.232082                19.4415             297  0  
    TEMPP901 [°C, ITS-90]              2.9981              7.232082                19.4415             297  0  
    TEMPPR01 [°C, ITS-90]              2.9981              7.232082                19.4415             297  0  
    QTEMP_01                           1                   1                       1                   297  0  
    CNDCST01 [S/m]                     2.9469              3.346433                4.225588            297  0  
    QCNDC_01                           1                   1                       1                   297  0  
    OXYOCPVL01 [V]                     1.8417              2.282159                2.98                297  0  
    QOXYV_01                           1                   1                       1                   297  0  
    IRRDUV01 [μEinstein/s/kg]          0                   0.01430488              0.63693             297  0  
    QPSAR_01                           1                   1                       1                   297  0  
    CPHLPR01 [mg/m³]                   0.5206              2.057365                9.3864              297  1  
    QCPHLPR01                          1                   1.026936                9                   297  0  
    AHSFZZ01 [m]                       2.78                57.74862                102.47              297  0  
    QALTB_01                           1                   1.040404                4                   297  0  
    preciseLat [°N]                    44.27               44.27085                44.2719             297  0  
    QLATD_01                           0                   0                       0                   297  0  
    preciseLon [°E]                    -63.319             -63.3185                -63.31844           297  0  
    QLOND_01                           0                   0                       0                   297  0  
    PSALST01 [PSS-78]                  30.773              32.72341                34.1996             297  0  
    PSLTZZ01 [PSS-78]                  30.773              32.72341                34.1996             297  0  
    QPSAL_01                           1                   1                       1                   297  0  
    POTMCV01 [°C, ITS-90]              2.9953              7.225871                19.4404             297  0  
    QPOTM_01                           1                   1                       1                   297  0  
    SIGTEQ01 [kg/m³]                   21.71               25.48022                26.5774             297  0  
    QSIGP_01                           1                   1                       1                   297  0  
    DOXYZZ01 [ml/l]                    4.2994              5.936503                7.524               297  0  
    QDOXY_01                           1                   1                       1                   297  0  
    RecPerBin                          2                   5.552189                18                  297  0  
    QCNTR_02                           1                   1                       1                   297  0  
    QCFF_01                            0                   0.1616162               12                  297  0  
    time                               2023-08-14 04:31:09 2023-08-14 04:31:09     2023-08-14 04:31:09 1    0  
    latitude [°N]                      44.269              44.2685                 44.2685             1    0  
    longitude [°E]                     -63.319             -63.3195                -63.3195            1    0  
    TemperatureSensor                  NA                  NA                      NA                  1    0  
    ConductivitySensor                 NA                  NA                      NA                  1    0  
    PressureSensor                     NA                  NA                      NA                  1    0  
    OxygenSensor                       NA                  NA                      NA                  1    0  
    PAR_BiosphericalLicorChelseaSensor NA                  NA                      NA                  1    0  
    FluoroWetlabECO_AFL_FL_Sensor      NA                  NA                      NA                  1    0  
    AltimeterSensor                    NA                  NA                      NA                  1    0  

* Processing Log

    - 2024-08-22 18:56:07.552 UTC: `Create oce object`
    - 2024-08-22 18:56:07.627 UTC: `read.netcdf("~/CTD_CAR2023011_242_497961_UP.ODF.nc")`
> 
dankelley commented 2 months ago

Commit 4246e4b8bcb22c516d8e5e7f3ec1ea743b0252f3 of the "develop" branch does better with this dataset, so I'm closing the issue. (More units can get added as they are needed, of course, but if we wait until all possible units are inserted, the issue will be open forever, because you just never know what sort of unit you'll run across.)


R version 4.4.1 (2024-06-14) -- "Race for Your Life"
Copyright (C) 2024 The R Foundation for Statistical Computing
Platform: x86_64-apple-darwin20

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(oce)
> # https://cioosatlantic.ca/erddag/files/bio_maritimes_region_ecosystem_survey_ctd/Maritimes%20Region%20Ecosystem%20Survey%20Summer/2023/CTD_CAR2023011_242_497961_UP.ODF.nc
> f <- "~/CTD_CAR2023011_242_497961_UP.ODF.nc"
> d <- read.netcdf(f)
> summary(d)
* Time: 2023-08-14 04:31:09
* Data Overview

                                       Min.                Mean                    Max.                Dim. NAs OriginalName                        
    measurement_time                   2023-08-14 05:04:11 2023-08-14 05:13:45.144 2023-08-14 05:26:15 297  0   "measurement_time"                  
    ScanNumber                         3967                5114.283                6614                297  0   "ScanNumber"                        
    QCNTR_01                           1                   1                       1                   297  0   "QCNTR_01"                          
    PRESPR01 [dbar]                    0.5                 74.5                    148.5               297  0   "PRESPR01"                          
    QPRES_01                           0                   0                       0                   297  0   "QPRES_01"                          
    TEMPS901 [°C, ITS-90]              2.9981              7.232082                19.4415             297  0   "TEMPS901"                          
    TEMPP901 [°C, ITS-90]              2.9981              7.232082                19.4415             297  0   "TEMPP901"                          
    TEMPPR01 [°C, ITS-90]              2.9981              7.232082                19.4415             297  0   "TEMPPR01"                          
    QTEMP_01                           1                   1                       1                   297  0   "QTEMP_01"                          
    CNDCST01 [S/m]                     2.9469              3.346433                4.225588            297  0   "CNDCST01"                          
    QCNDC_01                           1                   1                       1                   297  0   "QCNDC_01"                          
    OXYOCPVL01 [V]                     1.8417              2.282159                2.98                297  0   "OXYOCPVL01"                        
    QOXYV_01                           1                   1                       1                   297  0   "QOXYV_01"                          
    IRRDUV01 [μEinstein/s/kg]          0                   0.01430488              0.63693             297  0   "IRRDUV01"                          
    QPSAR_01                           1                   1                       1                   297  0   "QPSAR_01"                          
    CPHLPR01 [mg/m³]                   0.5206              2.057365                9.3864              297  1   "CPHLPR01"                          
    QCPHLPR01                          1                   1.026936                9                   297  0   "QCPHLPR01"                         
    AHSFZZ01 [m]                       2.78                57.74862                102.47              297  0   "AHSFZZ01"                          
    QALTB_01                           1                   1.040404                4                   297  0   "QALTB_01"                          
    preciseLat [°N]                    44.27               44.27085                44.2719             297  0   "preciseLat"                        
    QLATD_01                           0                   0                       0                   297  0   "QLATD_01"                          
    preciseLon [°E]                    -63.319             -63.3185                -63.31844           297  0   "preciseLon"                        
    QLOND_01                           0                   0                       0                   297  0   "QLOND_01"                          
    PSALST01 [PSS-78]                  30.773              32.72341                34.1996             297  0   "PSALST01"                          
    PSLTZZ01 [PSS-78]                  30.773              32.72341                34.1996             297  0   "PSLTZZ01"                          
    QPSAL_01                           1                   1                       1                   297  0   "QPSAL_01"                          
    POTMCV01 [°C, ITS-90]              2.9953              7.225871                19.4404             297  0   "POTMCV01"                          
    QPOTM_01                           1                   1                       1                   297  0   "QPOTM_01"                          
    SIGTEQ01 [kg/m³]                   21.71               25.48022                26.5774             297  0   "SIGTEQ01"                          
    QSIGP_01                           1                   1                       1                   297  0   "QSIGP_01"                          
    DOXYZZ01 [ml/l]                    4.2994              5.936503                7.524               297  0   "DOXYZZ01"                          
    QDOXY_01                           1                   1                       1                   297  0   "QDOXY_01"                          
    RecPerBin                          2                   5.552189                18                  297  0   "RecPerBin"                         
    QCNTR_02                           1                   1                       1                   297  0   "QCNTR_02"                          
    QCFF_01                            0                   0.1616162               12                  297  0   "QCFF_01"                           
    time                               2023-08-14 04:31:09 2023-08-14 04:31:09     2023-08-14 04:31:09 1    0   "time"                              
    latitude [°N]                      44.269              44.2685                 44.2685             1    0   "latitude"                          
    longitude [°E]                     -63.319             -63.3195                -63.3195            1    0   "longitude"                         
    TemperatureSensor                  NA                  NA                      NA                  1    0   "TemperatureSensor"                 
    ConductivitySensor                 NA                  NA                      NA                  1    0   "ConductivitySensor"                
    PressureSensor                     NA                  NA                      NA                  1    0   "PressureSensor"                    
    OxygenSensor                       NA                  NA                      NA                  1    0   "OxygenSensor"                      
    PAR_BiosphericalLicorChelseaSensor NA                  NA                      NA                  1    0   "PAR_BiosphericalLicorChelseaSensor"
    FluoroWetlabECO_AFL_FL_Sensor      NA                  NA                      NA                  1    0   "FluoroWetlabECO_AFL_FL_Sensor"     
    AltimeterSensor                    NA                  NA                      NA                  1    0   "AltimeterSensor"                   

* Processing Log

    - 2024-08-23 17:20:20.974 UTC: `Create oce object`
    - 2024-08-23 17:20:21.066 UTC: `read.netcdf("~/CTD_CAR2023011_242_497961_UP.ODF.nc")`
>