Closed Rdescoteaux closed 1 year ago
This looks quite straightforward. Maybe you can clarify a few things?
I am doing something else right now but I'll code this in the afternoon. (I am at UTC minus 3 hours.)
In the meantime, a question:
I've coded a trial version. Please try building oce from source using
remotes::install_github("dankelley/oce", ref="develop")
The documentation, available with ?oce::read.ctd.saiv
, is a bit sketchy. I list some assumptions that I made. I am not decoding much in the header, except the column names. (And I see an error in how the files are named -- see the docs.)
There are more details below (click the word Details to see them). Please have a look at this test, and maybe test with some of your data. And please do look at the list of assumptions and questions I have in the docs, in case you can answer some of them.
PS. one worry I have is with file encoding. This is a tricky thing for me to diagnose, and has led to problems on CRAN because one of the test machines (but not one I can access myself) has a European encoding that can cause problems. I can fix the problem by removing some characters from the file, e.g. I check for the "mu" letter directly, and that might cause a problem; that can be solved if I just do a grep()
operation for the other letters.
Oh, that's sweet. Local checks now reveal that it will fail on CRAN (see below) so I'll alter the file and try again. I won't push to github until it checks OK locally. (This won't affect your tests.)
W checking R files for non-ASCII characters ...
Found the following file with non-ASCII characters:
ctd.saiv.R
Portable packages must use only ASCII characters in their R code,
except perhaps in comments.
Use \uxxxx escapes for other characters.
I think I have the encoding issue fixed now in the "develop" branch, commit b3799401178f3e371d9aba5e760e3d02f8e80c95
@Rdescoteaux please test, when you get a chance. There's no rush.
My plan is to close this issue in a few days, since I think it's been done. @Rdescoteaux -- if you disagree, please feel free to add a comment stating what remains to be done. Note that comments can be made on closed issues, but if that comment includes a request to reopen it, I'll do so.
The point of closing it is simple: I use open issues as a sort of "to do" list.
Thanks again for pointing this out.
PS. if you wish to test it, please use the very latest version of oce from the "develop" branch. (Instructions for installing that are in the package README file.). I ask this because other things have also been improved in the meantime.
Hi Dan,
Thank you so much for your work on this and your patience :)
This is all new to me but in the end was able to build oce from source. However, I do get an error message when running function read.ctd.saiv:
_Error in if (depth[i] == 0) { : argument is of length zero In addition: Warning messages: 1: In readLines(file, n = 4) : invalid input found on input connection 'Tr1_St2_Graveneset.txt' 2: In readLines(file, n = 4) : incomplete final line found on 'Tr1_St2Graveneset.txt'
I'll answer some of your more specific questions below.
I should also mention that the example file attached above does not contain all possible fields. As the CTD profiles have been collected with different instruments, by different people and at different times, some of the variables can change from one to the other. Here are some other possible variables that we not present in Tr1_St2_Graveneset.txt that could be present in other casts:
Can you state exactly what the columns for C, p and O, will be? I can use that in renaming things. For example, I see T (FTU)
in your test file, which makes me thing that perhaps C would be Conductivity (mS/cm)
but guessing is a poor idea.
Arguably, oce could use approximate searches, e.g. searching for "Conductivy" or "conductivity", but I worry that the unit might be different in different files.
Also, I wonder if the file format says Pressure (dbar (m))
because that is self-contradictory, as 1dbar is only approximately the same a 1m.
@Rdescoteaux sorry, I didn't see your message as quoted below. (For some reason, github is not emailing me when I get mentioned. I'm looking into that.)
Your comment at https://github.com/dankelley/oce/issues/2141#issuecomment-1715885753 shows an error, so I'd like to help with that.
My guess is that there is an encoding issue, based on the fact that your comment contains the phrase "invalid input"
and that is referring to the spot where I read 4 lines, to get the header and column names. Line 4 of this file contains the Greek letter "mu", and I wonder if that's a problem.
I have altered the code about encoding. Please re-download the very-latest oce source ("develop" branch) and rebuild, and then try the R code as I have below. If that works, please post a note saying so. If not, please post the results you get. (Github has a way to format code as R, etc.; please use that system to make it easier to read things.)
If you do get it working, please be sure to tell me if you get the first non-header line in the output. As you can see below, I don't get that line. I think maybe read.delim()
is mixed up on how many lines to skip. But that's clutching at straws. In any case, I'd rather have code that reads all but the first line, as opposed to no code.
Also, please note that my test code is at https://github.com/dankelley/oce-issues/tree/main/21xx/2141 but the data file is not there. That is because I am assuming that your file is private. If that file is not private, please let me know, so I can put it into this oce-issues repo. I like files to be in that oce-issues repo because then co-developers can test things. (Some use a different encoding than I use, although there's no way to guess your encoding unless/until you let us know in a comment.)
Test code
library(oce)
d <- read.ctd.saiv("Tr1_St2_Graveneset.txt", debug=2)
summary(d)
Output DK gets from test code
> library(oce)
Loading required package: gsw
> d <- read.ctd.saiv("Tr1_St2_Graveneset.txt", debug=2)
read.ctd.saiv(file="Tr1_St2_Graveneset.txt", ...) {
header is:
[1] "From file: Tr1_all_stations\tInstrument no.:\t595"
[2] "Ser\tInterval (sec)\tIntegration\tAir pressure\tSalinity\tChart Datum (dbar)\t"
[3] "4\t1\t\t1019.84\t"
[4] "Ser\tMeas\tSal.\tTemp\tF (µg/l)\tT (FTU)\tDensity\tS. vel.\tDepth(u)\tDate\tTime\t\t"
Original data names: c("Ser", "Meas", "Sal.", "Temp", "F (µg/l)", "T (FTU)", "Density", "S. vel.", "Depth(u)", "Date", "Time")
data names: c("Ser", "Meas", "salinity", "temperature", "fluorescence", "turbidity", "density", "soundVelocity", "depth", "Date", "Time")
First 3 lines of data:
Ser Meas salinity temperature fluorescence turbidity density soundVelocity
1 4 585 0.01 8.129 0.08 0.47 -0.149 1439.68
2 4 586 0.02 8.127 0.09 0.55 -0.141 1439.69
3 4 587 0.01 8.115 0.15 0.57 -0.148 1439.62
depth Date Time
1 0 10/06/2023 09:46:23
2 0 10/06/2023 09:46:24
3 0 10/06/2023 09:46:25
Last 3 lines of data:
Ser Meas salinity temperature fluorescence turbidity density soundVelocity
316 4 900 0.02 2.371 0.05 29.00 -0.030 1414.04
317 4 901 0.02 2.382 0.05 30.11 -0.030 1414.08
318 4 902 0.02 2.391 0.05 16.78 -0.029 1414.13
depth Date Time
316 0 10/06/2023 09:51:38
317 0 10/06/2023 09:51:39
318 0 10/06/2023 09:51:40
as.ctd(...) {
case 2: salinity, temperature, pressure (etc) supplied
assuming modern units, since none provided
} # as.ctd()
> summary(d)
CTD Summary
-----------
* Data Overview
Min. Mean Max. Dim. NAs OriginalName
scan 1 159.5 318 318 0 -
salinity [PSS-78] 0 29.486 34.91 318 0 Sal.
temperature [°C, ITS-90] 1.941 2.3482 8.152 318 0 Temp
pressure [dbar] 0 6.5709 20.931 318 0 -
fluorescence [μg/L] 0 0.3005 2.18 318 0 F (µg/l)
turbidity [FTU] 0.01 2.3065 35.76 318 0 T (FTU)
Meas 585 743.5 902 318 0 Meas
Ser 4 4 4 318 0 Ser
* Processing Log
- 2023-09-13 19:59:21.450 UTC: `create 'ctd' object`
- 2023-09-13 19:59:21.451 UTC: `as.ctd(salinity = data$salinity, temperature = data$temperature, pressure = swPressure(data$depth), debug = debug - 1L)`
- 2023-09-13 19:59:21.451 UTC: `oceSetMetadata(object = res, name = "header", value = header)`
- 2023-09-13 19:59:21.451 UTC: `oceSetMetadata(object = res, name = "filename", value = filename)`
- 2023-09-13 19:59:21.451 UTC: `oceSetMetadata(object = res, name = "dataNamesOriginal", value = dno)`
- 2023-09-13 19:59:21.452 UTC: `oceSetData(object = res, name = "fluorescence", value = data$fluorescence, unit = list(unit = expression(mu * g/L), scale = ""))`
- 2023-09-13 19:59:21.452 UTC: `oceSetData(object = res, name = "turbidity", value = data$turbidity, unit = list(unit = expression(FTU), scale = ""))`
- 2023-09-13 19:59:21.452 UTC: `oceSetData(object = res, name = "Meas", value = data$Meas)`
- 2023-09-13 19:59:21.452 UTC: `oceSetData(object = res, name = "Ser", value = data$Ser)`
>
>
Thank you for this! I am at sea (collecting more SAIV CTD data ;) ) and will take a closer look when I return.
All the best,
Raphaelle
@Rdescoteaux I hope you have good luck with your work at sea, and that you return safely. Take care. Dan.
To the original reporter: have you made any progress on this? Do you need any more help? In the oce project, we like to use open issues as a sort of "to do" list, so we'd like to know if the issue ought to remain open.
PS. this is a standardized 'saved reply'.
This issue was closed by the developer, because the following conditions were met:
Of course, if the reporter wants to reopen the issue, that's perfectly fine! The point here is just to avoid having issues in the developers' "to do" list, when they are probably done.
PS. this is a standardized reply (AKA a 'saved reply' in github notation).
Oh sorry, I just came back to the office.
But yes, it works!! Thank you so much.
And I have gotten a list of variables from the SAIV manufacturer: List of parameters SAIV AS.pdf
Thank you for the time you spent on this. It was greatly appreciated :)
Raphaelle
@Rdescoteaux I hope your fieldwork went well. Thanks for the document on the names. I'll go through that later this week, and make additions. I am reopening this issue until I complete that.
Below is a checklist of variable names. When an item is checked off, that just means that I have done the work in the code. Only when I post a new comment will I mean that have pushed the new code to github. That will likely be a few days from now.
Ser
Series # NumberMeas
Measurement # NumberSal.
Salinity ppt Parts per thousand Calculated salinity from conductivity, temperature and depthCond.
Conductivity mS/cm Millisiemens per centimeter Inductive conductivity measurementTemp
Temperature ˚C Degrees Celsius Thermistor measurementOx %
Dissolved oxygen % Percentage SAIV 205 electrochemical oxygen sensorOpOx %
Dissolved oxygen % Percentage Aanderaa Instruments optode oxygen sensorOSOx %
Dissolved oxygen % Percentage Rinko III optical oxygen sensormg/l
Dissolved oxygen mg/l Milligram per liter Calculated dissolved oxygen from oxygen sensorsml/l
Dissolved oxygen ml/l Milliliter per liter Calculated dissolved oxygen from oxygen sensorsµmol/l
Dissolved oxygen µmol/l Micromole per liter Calculated dissolved oxygen from oxygen sensorsµmol/kg
Dissolved oxygen µmol/kg Micromole per kilogram Calculated dissolved oxygen from oxygen sensorsT (FTU)
Turbidity FTU Formazin Turbidity Unit Seapoint Sensors turbidity meterF (µg/l)
Fluorescence µg/l Microgram per liter Seapoint Sensors chlorophyll fluorometerDensity
Density kg/m³ Cubicmeter per kilogram Calculated density from salinity, temperature and pressureS. vel.
Sound velocity m/s Meter per second Calculated sound velocity from salinity, temperature and depthPress
Pressure dbar Decibar Keller custom-made pressure sensorDepth(u)
Depth m Meter Calculated depth by Unesco 1983 FormulaDepth(d)
Depth m Meter Calculated depth. Formula: Pressure / density x gravityDepth(p)
Depth m Meter Calculated depth. Formula: Pressure / avg. density x gravityDate
Date dd.Mon-yyyy Day, month and yearTime
Time hh:mm:ss Hour, minute and second@Rdescoteaux is it OK if I name you in the docs for the function? I presently have as below. (This is Roxygen code that will be typeset in the documentation.) I see three choices, and I hope you can decide:
Any of these choices is okay. Lots of people want to remain private, but I like users to know that there are many people who have contributed to oce.
#' @author Dan Kelley, with help from Rdescoteaux on github,
#' who supplied the sample file and the document about variable
#' names.
I've coded this in "develop", commit 448516c7df80f5da1b5b54421a258fed4b375444. For the test file I received a while back, I get as follows, which seems OK (for the variable names etc).
Note that I am not naming all the possible oxygen columns as "oxygen"
. Instead, I am giving them names that evoke the instrument type, e.g. "oxygenSAIV205"
and so forth. I suppose I could change this and put those details into the unit/scale, but I don't know whether a machine can have more than one oxygen sensor on it, in which case they would end up with names like "oxygen"
, "oxygen2"
and so forth, which might be confusing to the user. This could be changed, if somebody wanted to open a new issue for it.
library(oce)
#> Loading required package: gsw
d <- read.ctd.saiv("~/saiv.txt")
summary(d)
#> CTD Summary
#> -----------
#>
#> * Data Overview
#>
#> Min. Mean Max. Dim. NAs OriginalName
#> series 4 4 4 318 0 Ser
#> measurement 585 743.5 902 318 0 Meas
#> salinity [PSS-78] 0 29.486 34.91 318 0 Sal.
#> temperature [°C] 1.941 2.3482 8.152 318 0 Temp
#> fluorescence [ug/l] 0 0.3005 2.18 318 0 F (µg/l)
#> turbidity [FTU] 0.01 2.3065 35.76 318 0 T (FTU)
#> sigma [kg/m³] -0.15 23.578 27.984 318 0 Density
#> soundVelocity [m/s] 1412.9 1452.5 1458.7 318 0 S. vel.
#> depth [m, unesco] 0 6.5174 20.76 318 0 Depth(u)
#> date NA NA NA 318 0 Date
#> time NA NA NA 318 0 Time
#>
#> * Processing Log
#>
#> - 2023-11-06 12:57:31 UTC: `create 'ctd' object`
plot(d, eos="unesco")
Created on 2023-11-06 with reprex v2.0.2
I decided to follow the convention of grouping names, so (if this can happen) a datafile that contained two oxygen sensors would result in an oce object with variables named "oxygen"
and "oxygen2"
in its data
slot. The variety is stored in the "scale" part of the unit. This is not tested, since I have no file with oxygen (let alone multiple oxygens).
This is done in the "develop" branch, commit as below
commit 3b1bbcf63fed5ac6739f9d7567e361ab0944521c Author: dankelley kelley.dan@gmail.com Date: Mon Nov 6 16:07:28 2023 -0400
unduplicate varnames for ctd.saiv
Wonderful! Thank you :)
You are welcome to use my Github username.
Thanks. I think I'm done with my checklist on this data type, but please be on the lookout for problems, and open new issues if you find any.
Will do!
Tr1_St2_Graveneset.txt
Hi. I have a few hundred CTD profiles originating from a SAIV SD204 CTD. The files produced by the software while uploading the data from the CTD are .SD2, .OX2, .OP2 and .RF2 formats and the data can be exported as a text file (tab delimited, see example text file attached). The user manual (https://saiv.no/manuals/sd/SD204_manual_total.pdf) doesn't really describe the file types.
I would love to process the CTD profiles in a reproducible and more efficient way than using the SAIV software to manually process each CTD individually. Is there a way that we can make these text files readable with the oce package?
Thank you for your help :)
Raphaelle