dankelley / oce

R package for oceanographic processing
http://dankelley.github.io/oce/
GNU General Public License v3.0
143 stars 42 forks source link

CTD data from SAIV instrument #2141

Closed Rdescoteaux closed 1 year ago

Rdescoteaux commented 1 year ago

Tr1_St2_Graveneset.txt

Hi. I have a few hundred CTD profiles originating from a SAIV SD204 CTD. The files produced by the software while uploading the data from the CTD are .SD2, .OX2, .OP2 and .RF2 formats and the data can be exported as a text file (tab delimited, see example text file attached). The user manual (https://saiv.no/manuals/sd/SD204_manual_total.pdf) doesn't really describe the file types.

I would love to process the CTD profiles in a reproducible and more efficient way than using the SAIV software to manually process each CTD individually. Is there a way that we can make these text files readable with the oce package?

Thank you for your help :)

Raphaelle

dankelley commented 1 year ago

This looks quite straightforward. Maybe you can clarify a few things?

  1. Am I right to assume the temperature is in-situ, and degree C?
  2. I'm guessing F is fluorescence and T is turbidity. Correct?
  3. I don't understand the "Depth(u)" field. Is depth in metres? Maybe the "u" means "underwater"?

I am doing something else right now but I'll code this in the afternoon. (I am at UTC minus 3 hours.)

In the meantime, a question:

  1. Are you set up to build packages from source? For oce, you'll need both a fortran compiler and a C++ compiler.
dankelley commented 1 year ago

I've coded a trial version. Please try building oce from source using

remotes::install_github("dankelley/oce", ref="develop")

The documentation, available with ?oce::read.ctd.saiv, is a bit sketchy. I list some assumptions that I made. I am not decoding much in the header, except the column names. (And I see an error in how the files are named -- see the docs.)

There are more details below (click the word Details to see them). Please have a look at this test, and maybe test with some of your data. And please do look at the list of assumptions and questions I have in the docs, in case you can answer some of them.

As a test, ```R library(oce) d <- read.ctd.saiv("~/Tr1_St2_Graveneset.txt") summary(d) png("saiv.png") plot(d, eos="unesco") ``` produces ``` CTD Summary ----------- * File: "/Users/kelley/Tr1_St2_Graveneset.txt" * Data Overview Min. Mean Max. Dim. NAs OriginalName scan 1 159.5 318 318 0 - salinity [PSS-78] 0 29.486 34.91 318 0 Sal. temperature [°C, ITS-90] 1.941 2.3482 8.152 318 0 Temp pressure [dbar] 0 6.5709 20.931 318 0 - fluorescence [μg/L] 0 0.3005 2.18 318 0 F (µg/l) turbidity [FTU] 0.01 2.3065 35.76 318 0 T (FTU) Meas 585 743.5 902 318 0 Meas Ser 4 4 4 318 0 Ser * Processing Log - 2023-08-30 17:34:06.473 UTC: `create 'ctd' object` - 2023-08-30 17:34:06.475 UTC: `as.ctd(salinity = data$salinity, temperature = data$temperature, pressure = swPressure(data$depth), debug = debug - 1L)` - 2023-08-30 17:34:06.482 UTC: `oceSetMetadata(object = res, name = "header", value = header)` - 2023-08-30 17:34:06.482 UTC: `oceSetMetadata(object = res, name = "filename", value = filename)` - 2023-08-30 17:34:06.482 UTC: `oceSetMetadata(object = res, name = "dataNamesOriginal", value = dno)` - 2023-08-30 17:34:06.483 UTC: `oceSetData(object = res, name = "fluorescence", value = data$fluorescence, unit = list(unit = expression(mu * g/L), scale = ""))` - 2023-08-30 17:34:06.483 UTC: `oceSetData(object = res, name = "turbidity", value = data$turbidity, unit = list(unit = expression(FTU), scale = ""))` - 2023-08-30 17:34:06.483 UTC: `oceSetData(object = res, name = "Meas", value = data$Meas)` - 2023-08-30 17:34:06.483 UTC: `oceSetData(object = res, name = "Ser", value = data$Ser)` ``` and the plot shown below. ![saiv](https://github.com/dankelley/oce/assets/99469/4338bdfd-04cd-4b4c-8e56-b2e859cdd900)
dankelley commented 1 year ago

PS. one worry I have is with file encoding. This is a tricky thing for me to diagnose, and has led to problems on CRAN because one of the test machines (but not one I can access myself) has a European encoding that can cause problems. I can fix the problem by removing some characters from the file, e.g. I check for the "mu" letter directly, and that might cause a problem; that can be solved if I just do a grep() operation for the other letters.

dankelley commented 1 year ago

Oh, that's sweet. Local checks now reveal that it will fail on CRAN (see below) so I'll alter the file and try again. I won't push to github until it checks OK locally. (This won't affect your tests.)

W  checking R files for non-ASCII characters ...
   Found the following file with non-ASCII characters:
     ctd.saiv.R
   Portable packages must use only ASCII characters in their R code,
   except perhaps in comments.
   Use \uxxxx escapes for other characters.
dankelley commented 1 year ago

I think I have the encoding issue fixed now in the "develop" branch, commit b3799401178f3e371d9aba5e760e3d02f8e80c95

@Rdescoteaux please test, when you get a chance. There's no rush.

dankelley commented 1 year ago

My plan is to close this issue in a few days, since I think it's been done. @Rdescoteaux -- if you disagree, please feel free to add a comment stating what remains to be done. Note that comments can be made on closed issues, but if that comment includes a request to reopen it, I'll do so.

The point of closing it is simple: I use open issues as a sort of "to do" list.

Thanks again for pointing this out.

PS. if you wish to test it, please use the very latest version of oce from the "develop" branch. (Instructions for installing that are in the package README file.). I ask this because other things have also been improved in the meantime.

Rdescoteaux commented 1 year ago

Hi Dan,

Thank you so much for your work on this and your patience :)

This is all new to me but in the end was able to build oce from source. However, I do get an error message when running function read.ctd.saiv:

_Error in if (depth[i] == 0) { : argument is of length zero In addition: Warning messages: 1: In readLines(file, n = 4) : invalid input found on input connection 'Tr1_St2_Graveneset.txt' 2: In readLines(file, n = 4) : incomplete final line found on 'Tr1_St2Graveneset.txt'

I'll answer some of your more specific questions below.

Rdescoteaux commented 1 year ago
Rdescoteaux commented 1 year ago

I should also mention that the example file attached above does not contain all possible fields. As the CTD profiles have been collected with different instruments, by different people and at different times, some of the variables can change from one to the other. Here are some other possible variables that we not present in Tr1_St2_Graveneset.txt that could be present in other casts:

dankelley commented 1 year ago

Can you state exactly what the columns for C, p and O, will be? I can use that in renaming things. For example, I see T (FTU) in your test file, which makes me thing that perhaps C would be Conductivity (mS/cm) but guessing is a poor idea.

Arguably, oce could use approximate searches, e.g. searching for "Conductivy" or "conductivity", but I worry that the unit might be different in different files.

Also, I wonder if the file format says Pressure (dbar (m)) because that is self-contradictory, as 1dbar is only approximately the same a 1m.

dankelley commented 1 year ago

@Rdescoteaux sorry, I didn't see your message as quoted below. (For some reason, github is not emailing me when I get mentioned. I'm looking into that.)

Your comment at https://github.com/dankelley/oce/issues/2141#issuecomment-1715885753 shows an error, so I'd like to help with that.

My guess is that there is an encoding issue, based on the fact that your comment contains the phrase "invalid input" and that is referring to the spot where I read 4 lines, to get the header and column names. Line 4 of this file contains the Greek letter "mu", and I wonder if that's a problem.

I have altered the code about encoding. Please re-download the very-latest oce source ("develop" branch) and rebuild, and then try the R code as I have below. If that works, please post a note saying so. If not, please post the results you get. (Github has a way to format code as R, etc.; please use that system to make it easier to read things.)

If you do get it working, please be sure to tell me if you get the first non-header line in the output. As you can see below, I don't get that line. I think maybe read.delim() is mixed up on how many lines to skip. But that's clutching at straws. In any case, I'd rather have code that reads all but the first line, as opposed to no code.

Also, please note that my test code is at https://github.com/dankelley/oce-issues/tree/main/21xx/2141 but the data file is not there. That is because I am assuming that your file is private. If that file is not private, please let me know, so I can put it into this oce-issues repo. I like files to be in that oce-issues repo because then co-developers can test things. (Some use a different encoding than I use, although there's no way to guess your encoding unless/until you let us know in a comment.)

Test code

library(oce)
d <- read.ctd.saiv("Tr1_St2_Graveneset.txt", debug=2)
summary(d)

Output DK gets from test code

> library(oce)
Loading required package: gsw
> d <- read.ctd.saiv("Tr1_St2_Graveneset.txt", debug=2)
    read.ctd.saiv(file="Tr1_St2_Graveneset.txt", ...) {
header is:
[1] "From file: Tr1_all_stations\tInstrument no.:\t595"                                   
[2] "Ser\tInterval (sec)\tIntegration\tAir pressure\tSalinity\tChart Datum (dbar)\t"      
[3] "4\t1\t\t1019.84\t"                                                                   
[4] "Ser\tMeas\tSal.\tTemp\tF (µg/l)\tT (FTU)\tDensity\tS. vel.\tDepth(u)\tDate\tTime\t\t"
      Original data names: c("Ser", "Meas", "Sal.", "Temp", "F (µg/l)", "T (FTU)", "Density", "S. vel.", "Depth(u)", "Date", "Time")
      data names: c("Ser", "Meas", "salinity", "temperature", "fluorescence", "turbidity", "density", "soundVelocity", "depth", "Date", "Time")
First 3 lines of data:
  Ser Meas salinity temperature fluorescence turbidity density soundVelocity
1   4  585     0.01       8.129         0.08      0.47  -0.149       1439.68
2   4  586     0.02       8.127         0.09      0.55  -0.141       1439.69
3   4  587     0.01       8.115         0.15      0.57  -0.148       1439.62
  depth       Date     Time
1     0 10/06/2023 09:46:23
2     0 10/06/2023 09:46:24
3     0 10/06/2023 09:46:25
Last 3 lines of data:
    Ser Meas salinity temperature fluorescence turbidity density soundVelocity
316   4  900     0.02       2.371         0.05     29.00  -0.030       1414.04
317   4  901     0.02       2.382         0.05     30.11  -0.030       1414.08
318   4  902     0.02       2.391         0.05     16.78  -0.029       1414.13
    depth       Date     Time
316     0 10/06/2023 09:51:38
317     0 10/06/2023 09:51:39
318     0 10/06/2023 09:51:40
      as.ctd(...) {
        case 2: salinity, temperature, pressure (etc) supplied
        assuming modern units, since none provided
      } # as.ctd()
> summary(d)
CTD Summary
-----------

* Data Overview

                              Min.   Mean   Max. Dim. NAs OriginalName
    scan                         1  159.5    318  318   0            -
    salinity [PSS-78]            0 29.486  34.91  318   0         Sal.
    temperature [°C, ITS-90] 1.941 2.3482  8.152  318   0         Temp
    pressure [dbar]              0 6.5709 20.931  318   0            -
    fluorescence [μg/L]          0 0.3005   2.18  318   0     F (µg/l)
    turbidity [FTU]           0.01 2.3065  35.76  318   0      T (FTU)
    Meas                       585  743.5    902  318   0         Meas
    Ser                          4      4      4  318   0          Ser

* Processing Log

    - 2023-09-13 19:59:21.450 UTC: `create 'ctd' object`
    - 2023-09-13 19:59:21.451 UTC: `as.ctd(salinity = data$salinity, temperature = data$temperature,     pressure = swPressure(data$depth), debug = debug - 1L)`
    - 2023-09-13 19:59:21.451 UTC: `oceSetMetadata(object = res, name = "header", value = header)`
    - 2023-09-13 19:59:21.451 UTC: `oceSetMetadata(object = res, name = "filename", value = filename)`
    - 2023-09-13 19:59:21.451 UTC: `oceSetMetadata(object = res, name = "dataNamesOriginal", value = dno)`
    - 2023-09-13 19:59:21.452 UTC: `oceSetData(object = res, name = "fluorescence", value = data$fluorescence,     unit = list(unit = expression(mu * g/L), scale = ""))`
    - 2023-09-13 19:59:21.452 UTC: `oceSetData(object = res, name = "turbidity", value = data$turbidity,     unit = list(unit = expression(FTU), scale = ""))`
    - 2023-09-13 19:59:21.452 UTC: `oceSetData(object = res, name = "Meas", value = data$Meas)`
    - 2023-09-13 19:59:21.452 UTC: `oceSetData(object = res, name = "Ser", value = data$Ser)`
> 
> 
Rdescoteaux commented 1 year ago

Thank you for this! I am at sea (collecting more SAIV CTD data ;) ) and will take a closer look when I return.

All the best,

Raphaelle

dankelley commented 1 year ago

@Rdescoteaux I hope you have good luck with your work at sea, and that you return safely. Take care. Dan.

dankelley commented 1 year ago

To the original reporter: have you made any progress on this? Do you need any more help? In the oce project, we like to use open issues as a sort of "to do" list, so we'd like to know if the issue ought to remain open.

PS. this is a standardized 'saved reply'.

dankelley commented 1 year ago

This issue was closed by the developer, because the following conditions were met:

  1. a solution has been implemennted
  2. new tests were added to check the solution
  3. issue discussion seems to have ceased

Of course, if the reporter wants to reopen the issue, that's perfectly fine! The point here is just to avoid having issues in the developers' "to do" list, when they are probably done.

PS. this is a standardized reply (AKA a 'saved reply' in github notation).

Rdescoteaux commented 1 year ago

Oh sorry, I just came back to the office.

But yes, it works!! Thank you so much.

And I have gotten a list of variables from the SAIV manufacturer: List of parameters SAIV AS.pdf

Thank you for the time you spent on this. It was greatly appreciated :)

Raphaelle

dankelley commented 1 year ago

@Rdescoteaux I hope your fieldwork went well. Thanks for the document on the names. I'll go through that later this week, and make additions. I am reopening this issue until I complete that.

dankelley commented 1 year ago

Below is a checklist of variable names. When an item is checked off, that just means that I have done the work in the code. Only when I post a new comment will I mean that have pushed the new code to github. That will likely be a few days from now.

dankelley commented 1 year ago

@Rdescoteaux is it OK if I name you in the docs for the function? I presently have as below. (This is Roxygen code that will be typeset in the documentation.) I see three choices, and I hope you can decide:

  1. not to name you at all
  2. to name you just with your github username, as below
  3. to use your actual name

Any of these choices is okay. Lots of people want to remain private, but I like users to know that there are many people who have contributed to oce.

#' @author Dan Kelley, with help from Rdescoteaux on github,
#' who supplied the sample file and the document about variable
#' names.
dankelley commented 1 year ago

I've coded this in "develop", commit 448516c7df80f5da1b5b54421a258fed4b375444. For the test file I received a while back, I get as follows, which seems OK (for the variable names etc).

Note that I am not naming all the possible oxygen columns as "oxygen". Instead, I am giving them names that evoke the instrument type, e.g. "oxygenSAIV205" and so forth. I suppose I could change this and put those details into the unit/scale, but I don't know whether a machine can have more than one oxygen sensor on it, in which case they would end up with names like "oxygen", "oxygen2" and so forth, which might be confusing to the user. This could be changed, if somebody wanted to open a new issue for it.

library(oce)
#> Loading required package: gsw
d <- read.ctd.saiv("~/saiv.txt")
summary(d)
#> CTD Summary
#> -----------
#> 
#> * Data Overview
#> 
#>                           Min.   Mean   Max. Dim. NAs OriginalName
#>     series                   4      4      4  318   0          Ser
#>     measurement            585  743.5    902  318   0         Meas
#>     salinity [PSS-78]        0 29.486  34.91  318   0         Sal.
#>     temperature [°C]     1.941 2.3482  8.152  318   0         Temp
#>     fluorescence [ug/l]      0 0.3005   2.18  318   0     F (µg/l)
#>     turbidity [FTU]       0.01 2.3065  35.76  318   0      T (FTU)
#>     sigma [kg/m³]        -0.15 23.578 27.984  318   0      Density
#>     soundVelocity [m/s] 1412.9 1452.5 1458.7  318   0      S. vel.
#>     depth [m, unesco]        0 6.5174  20.76  318   0     Depth(u)
#>     date                    NA     NA     NA  318   0         Date
#>     time                    NA     NA     NA  318   0         Time
#> 
#> * Processing Log
#> 
#>     - 2023-11-06 12:57:31 UTC: `create 'ctd' object`
plot(d, eos="unesco")

Created on 2023-11-06 with reprex v2.0.2

dankelley commented 1 year ago

I decided to follow the convention of grouping names, so (if this can happen) a datafile that contained two oxygen sensors would result in an oce object with variables named "oxygen" and "oxygen2" in its data slot. The variety is stored in the "scale" part of the unit. This is not tested, since I have no file with oxygen (let alone multiple oxygens).

This is done in the "develop" branch, commit as below

commit 3b1bbcf63fed5ac6739f9d7567e361ab0944521c Author: dankelley kelley.dan@gmail.com Date: Mon Nov 6 16:07:28 2023 -0400

unduplicate varnames for ctd.saiv
Rdescoteaux commented 1 year ago

Wonderful! Thank you :)

You are welcome to use my Github username.

dankelley commented 1 year ago

Thanks. I think I'm done with my checklist on this data type, but please be on the lookout for problems, and open new issues if you find any.

Rdescoteaux commented 1 year ago

Will do!