Closed dankelley closed 2 months ago
I will take a look at this now. Right off the bat, I see a typo in one of the units: ml/L
, which uses lower-case in the numerator and upper-case in the denominator. But the code seems not to be picking up many units at all, so finding the reason for that is my first priority.
The problem is that as.unit()
does not recognize the units in this NetCDF file. Its docs say it recognizes as below (Roxygen format).
And, for example, this NetCDF file lists a temperature unit as "degrees Celsius"
.
Question for @richardsc or @clayton33: do you have any insights on the units in these files, or who I ought to contact to get a list? Obviously, I can make some guesses, but if these (CIOOS-provided) data are provisional, it would make sense to wait until things are finalized. Also, if I had a contact name I could ask whether the choice to write "ml/L"
is final (in which case oce should decode it) or not.
PS. I can, of course, make as.unit()
return the string if it doesn't recognize it. That way, users would at least see it. But it would not be a proper unit, in the sense of oce -- for example, temperature units in oce get not just an expression for the symbols to get deg-C, but also the scale (the 1968 convention, the 1990 convention, etc) and that scale can be used in internal conversions within oce.
#' @param u A character string indicating a unit. Case is ignored, so that e.g.
#' `"dbar"` and `"DBAR"` yield equal results. The following are recognized:
#' c(`"m-1"`, `"dbar"`, `"decibar"`, `"degree"`, `"degree_Celcius"`,
#' `"degree_north"`, `"degree_east"`, `"ipts-68"`,
#' `"its-90"`, `"m/s^1"`, `"m/s^2"`, `"pss-78"`,
#' `"umol/kg"`, `"micromole/kg"`)
I've made read.netcdf()
retain the unknown unit (as a string ... so not a real unit in the oce sense) and so now I see e.g. this snippet in the summary.
This is not yet pushed to GH. I don't do that until I rebuild and retest (so, not for half an hour).
Still, as an amusing point, these three items really seem like they mean the same thing. If you look at #2232 you can see code that indicates that there are only very small differences in the metadata of these entries. I have no idea what is going on, and worry that these CIOOS files are still in a provisional form, which would mean that there is no sense in altering read.netcdf()
to try to decode the units as stated in the files. Comments, @clayton33 or @richardsc?
TEMPS901 [° Celsius] 2.9981 7.2321 19.442 297 0
TEMPP901 [° Celsius] 2.9981 7.2321 19.442 297 0
TEMPPR01 [° Celsius] 2.9981 7.2321 19.442 297 0
For each variable, the units are probably pulled from the associated NERC units vocabulary, http://vocab.nerc.ac.uk/collection/P06/current/ (see my comment on issue #2234), each variable has related unit.
I'm making some progress on units. Below is what I get with a sample file. (This code is not pushed to GH, though.)
R version 4.4.1 (2024-06-14) -- "Race for Your Life"
Copyright (C) 2024 The R Foundation for Statistical Computing
Platform: x86_64-apple-darwin20
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
[Previously saved workspace restored]
> library(oce)
Loading required package: gsw
> source("~/git/oce/R/AllClass.R");source("~/git/oce/R/netcdf.R");source("~/git/oce/R/units.R")
> # https://cioosatlantic.ca/erddag/files/bio_maritimes_region_ecosystem_survey_ctd/Maritimes%20Region%20Ecosystem%20Survey%20Summer/2023/CTD_CAR2023011_242_497961_UP.ODF.nc
> f <- "~/CTD_CAR2023011_242_497961_UP.ODF.nc"
> options(warn = 1) # so we see warnings as they occur
> d <- read.netcdf(f)
> summary(d)
* Time: 2023-08-14 04:31:09
* Data Overview
Min. Mean Max. Dim. NAs
measurement_time 2023-08-14 05:04:11 2023-08-14 05:13:45.144 2023-08-14 05:26:15 297 0
ScanNumber 3967 5114.283 6614 297 0
QCNTR_01 1 1 1 297 0
PRESPR01 [dbar] 0.5 74.5 148.5 297 0
QPRES_01 0 0 0 297 0
TEMPS901 [°C, ITS-90] 2.9981 7.232082 19.4415 297 0
TEMPP901 [°C, ITS-90] 2.9981 7.232082 19.4415 297 0
TEMPPR01 [°C, ITS-90] 2.9981 7.232082 19.4415 297 0
QTEMP_01 1 1 1 297 0
CNDCST01 [S/m] 2.9469 3.346433 4.225588 297 0
QCNDC_01 1 1 1 297 0
OXYOCPVL01 [V] 1.8417 2.282159 2.98 297 0
QOXYV_01 1 1 1 297 0
IRRDUV01 [μEinstein/s/kg] 0 0.01430488 0.63693 297 0
QPSAR_01 1 1 1 297 0
CPHLPR01 [mg/m³] 0.5206 2.057365 9.3864 297 1
QCPHLPR01 1 1.026936 9 297 0
AHSFZZ01 [m] 2.78 57.74862 102.47 297 0
QALTB_01 1 1.040404 4 297 0
preciseLat [°N] 44.27 44.27085 44.2719 297 0
QLATD_01 0 0 0 297 0
preciseLon [°E] -63.319 -63.3185 -63.31844 297 0
QLOND_01 0 0 0 297 0
PSALST01 [PSS-78] 30.773 32.72341 34.1996 297 0
PSLTZZ01 [PSS-78] 30.773 32.72341 34.1996 297 0
QPSAL_01 1 1 1 297 0
POTMCV01 [°C, ITS-90] 2.9953 7.225871 19.4404 297 0
QPOTM_01 1 1 1 297 0
SIGTEQ01 [kg/m³] 21.71 25.48022 26.5774 297 0
QSIGP_01 1 1 1 297 0
DOXYZZ01 [ml/l] 4.2994 5.936503 7.524 297 0
QDOXY_01 1 1 1 297 0
RecPerBin 2 5.552189 18 297 0
QCNTR_02 1 1 1 297 0
QCFF_01 0 0.1616162 12 297 0
time 2023-08-14 04:31:09 2023-08-14 04:31:09 2023-08-14 04:31:09 1 0
latitude [°N] 44.269 44.2685 44.2685 1 0
longitude [°E] -63.319 -63.3195 -63.3195 1 0
TemperatureSensor NA NA NA 1 0
ConductivitySensor NA NA NA 1 0
PressureSensor NA NA NA 1 0
OxygenSensor NA NA NA 1 0
PAR_BiosphericalLicorChelseaSensor NA NA NA 1 0
FluoroWetlabECO_AFL_FL_Sensor NA NA NA 1 0
AltimeterSensor NA NA NA 1 0
* Processing Log
- 2024-08-22 18:56:07.552 UTC: `Create oce object`
- 2024-08-22 18:56:07.627 UTC: `read.netcdf("~/CTD_CAR2023011_242_497961_UP.ODF.nc")`
>
Commit 4246e4b8bcb22c516d8e5e7f3ec1ea743b0252f3 of the "develop" branch does better with this dataset, so I'm closing the issue. (More units can get added as they are needed, of course, but if we wait until all possible units are inserted, the issue will be open forever, because you just never know what sort of unit you'll run across.)
R version 4.4.1 (2024-06-14) -- "Race for Your Life"
Copyright (C) 2024 The R Foundation for Statistical Computing
Platform: x86_64-apple-darwin20
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(oce)
> # https://cioosatlantic.ca/erddag/files/bio_maritimes_region_ecosystem_survey_ctd/Maritimes%20Region%20Ecosystem%20Survey%20Summer/2023/CTD_CAR2023011_242_497961_UP.ODF.nc
> f <- "~/CTD_CAR2023011_242_497961_UP.ODF.nc"
> d <- read.netcdf(f)
> summary(d)
* Time: 2023-08-14 04:31:09
* Data Overview
Min. Mean Max. Dim. NAs OriginalName
measurement_time 2023-08-14 05:04:11 2023-08-14 05:13:45.144 2023-08-14 05:26:15 297 0 "measurement_time"
ScanNumber 3967 5114.283 6614 297 0 "ScanNumber"
QCNTR_01 1 1 1 297 0 "QCNTR_01"
PRESPR01 [dbar] 0.5 74.5 148.5 297 0 "PRESPR01"
QPRES_01 0 0 0 297 0 "QPRES_01"
TEMPS901 [°C, ITS-90] 2.9981 7.232082 19.4415 297 0 "TEMPS901"
TEMPP901 [°C, ITS-90] 2.9981 7.232082 19.4415 297 0 "TEMPP901"
TEMPPR01 [°C, ITS-90] 2.9981 7.232082 19.4415 297 0 "TEMPPR01"
QTEMP_01 1 1 1 297 0 "QTEMP_01"
CNDCST01 [S/m] 2.9469 3.346433 4.225588 297 0 "CNDCST01"
QCNDC_01 1 1 1 297 0 "QCNDC_01"
OXYOCPVL01 [V] 1.8417 2.282159 2.98 297 0 "OXYOCPVL01"
QOXYV_01 1 1 1 297 0 "QOXYV_01"
IRRDUV01 [μEinstein/s/kg] 0 0.01430488 0.63693 297 0 "IRRDUV01"
QPSAR_01 1 1 1 297 0 "QPSAR_01"
CPHLPR01 [mg/m³] 0.5206 2.057365 9.3864 297 1 "CPHLPR01"
QCPHLPR01 1 1.026936 9 297 0 "QCPHLPR01"
AHSFZZ01 [m] 2.78 57.74862 102.47 297 0 "AHSFZZ01"
QALTB_01 1 1.040404 4 297 0 "QALTB_01"
preciseLat [°N] 44.27 44.27085 44.2719 297 0 "preciseLat"
QLATD_01 0 0 0 297 0 "QLATD_01"
preciseLon [°E] -63.319 -63.3185 -63.31844 297 0 "preciseLon"
QLOND_01 0 0 0 297 0 "QLOND_01"
PSALST01 [PSS-78] 30.773 32.72341 34.1996 297 0 "PSALST01"
PSLTZZ01 [PSS-78] 30.773 32.72341 34.1996 297 0 "PSLTZZ01"
QPSAL_01 1 1 1 297 0 "QPSAL_01"
POTMCV01 [°C, ITS-90] 2.9953 7.225871 19.4404 297 0 "POTMCV01"
QPOTM_01 1 1 1 297 0 "QPOTM_01"
SIGTEQ01 [kg/m³] 21.71 25.48022 26.5774 297 0 "SIGTEQ01"
QSIGP_01 1 1 1 297 0 "QSIGP_01"
DOXYZZ01 [ml/l] 4.2994 5.936503 7.524 297 0 "DOXYZZ01"
QDOXY_01 1 1 1 297 0 "QDOXY_01"
RecPerBin 2 5.552189 18 297 0 "RecPerBin"
QCNTR_02 1 1 1 297 0 "QCNTR_02"
QCFF_01 0 0.1616162 12 297 0 "QCFF_01"
time 2023-08-14 04:31:09 2023-08-14 04:31:09 2023-08-14 04:31:09 1 0 "time"
latitude [°N] 44.269 44.2685 44.2685 1 0 "latitude"
longitude [°E] -63.319 -63.3195 -63.3195 1 0 "longitude"
TemperatureSensor NA NA NA 1 0 "TemperatureSensor"
ConductivitySensor NA NA NA 1 0 "ConductivitySensor"
PressureSensor NA NA NA 1 0 "PressureSensor"
OxygenSensor NA NA NA 1 0 "OxygenSensor"
PAR_BiosphericalLicorChelseaSensor NA NA NA 1 0 "PAR_BiosphericalLicorChelseaSensor"
FluoroWetlabECO_AFL_FL_Sensor NA NA NA 1 0 "FluoroWetlabECO_AFL_FL_Sensor"
AltimeterSensor NA NA NA 1 0 "AltimeterSensor"
* Processing Log
- 2024-08-23 17:20:20.974 UTC: `Create oce object`
- 2024-08-23 17:20:21.066 UTC: `read.netcdf("~/CTD_CAR2023011_242_497961_UP.ODF.nc")`
>
I'm looking at a file from cioosatlantic.ca/erddap and
read.netcdf()
gets the unit for pressure, but nothing else. But when I look in the netcdf file, I see that units are listed. I am assuming that the units are being written differently than what is expected byread.netcdf()
. Since I think reading netcdf files is an important task (perhaps becoming more important over time), I'll look into this.CHECKLIST (for one particular file):
PS. in the code and output below, you'll see some untranslated variable names. That's because I've not seen such names before. They are not the common names that come up in SBE datasets (even though this is a CTD dataset I'm working with here). But decoding those names is a separate issue from the present one (which I'll make in a moment).