dankelley / oce

R package for oceanographic processing
http://dankelley.github.io/oce/
GNU General Public License v3.0
143 stars 42 forks source link

Unable to import CTD files from an AML Base.X2 downloaded with Sailfish 1.4.8.0 #2247

Closed m4rkus84 closed 1 month ago

m4rkus84 commented 1 month ago

I have CTD profile files from an AML Base.X2 probe imported to my computer using the Sailfish 1.4.8.0 software. I have them in both CSV and CNV formats. I'm trying to import them into the oce package trying the commands: read.oce, read.ctd and read.ctd.aml, but I am encountering the following errors.

with csv files:

Error in read.oce(file = "C:/Users/my_name/Desktop/Export/2024-09-04_11-02-33.csv") : unknown file type "unknown"

Error in read.ctd(file = "C:/Users/my_name/Desktop/Export/2024-09-04_11-02-33.csv") : Cannot discover type in line '[Header]' bird

Error in read.ctd.aml(file = "C:/Users/my_name/Desktop/Export/2024-09-04_11-02-33.csv") : cannot determine file 'format' by examining first line (shown below) [Header]

with cnv files:

Error in read.oce(file = "C:/Users/my_name/Desktop/Export/2024-09-04_11-02-33.cnv") : unknown file type "unknown"

Error in read.ctd(file = "C:/Users/my_name/Desktop/Export/2024-09-04_11-02-33.cnv") : Cannot discover type in line 'Sailfish SFE19plus Data File:' bird

Error in read.ctd.aml(file = "C:/Users/my_name/Desktop/Export/2024-09-04_11-02-33.cnv") : cannot determine file 'format' by examining first line (shown below) Sailfish SFE19plus Data File:

I would like to understand why direct import is not working, without having to modify the files or transform them using other libraries. I am attaching two sample files. Thank you to anyone who can provide suggestions. 2024-09-04_11-02-33.csv 2024-09-04_11-02-33.zip

dankelley commented 1 month ago

The CNV file is not in proper CNV format, as used by SBE instruments. So that's why that fails.

The other file is not in a format understood by read.ctd.aml(). This function was based on files I had available to me, sent by an oce user. Here is a quote from the documentation:

The handled formats match files available to the author, both of which diverge slightly from the format described in the AML documentation (see References}.

I think AML must have changed the format. In the screen snapshot below, the left panel is your file, and the right panel is a file provided with the oce package (based on a format that existed when read.ctd.aml() was written). As you can see, they are quite different. Frankly, the format of your file looks more sensible, without those blank lines etc. But clearly code written for the left-hand panel is not going to be able to parse the right-hand panel. I am going to post this comment, and make another in a moment, asking you some questions.

Screenshot 2024-09-20 at 7 19 08 AM
dankelley commented 1 month ago

Some questions for @m4rkus84 --

  1. Are you set up to build packages from source? If so, some changes could be made to oce to read your file. (They won't get into the CRAN version for maybe 6 months, though, because CRAN does not permit frequent updates.)
  2. Do you have documentation for your instrument that includes a specification of the format? If so, please email it to me at dan.kelley@dal.ca (I will also supply it to @richardsc via private email).

The second point is important. It is very time-consuming, and usually somewhat error-prone, to guess formats based on a file.

dankelley commented 1 month ago

Oh, and @m4rkus84, we can get something that will do most of what you want with a few lines of R. This would be the kernel of a new format if it went into oce. But we don't put things into oce unless we are quite confident that they will help users, as opposed to providing a stopgap.

It will only take me 20 minutes to write some code for you. I have to transit locations first, and might get caught up with students before I lecture for 2 hours this morning.

dankelley commented 1 month ago

Here's a very crude test code. It skips a lot of the data (and almost all of the metadata) and it doesn't do checks (like whether there is one, and only one, longitude line). But it at least is a start. I'll be offline for a while now.

# VERY minimal reading of new (?) AML ctd format
library(oce)

testing <- function(file) {
    lines <- readLines(file)
    w <- grep("^Longitude=", lines)
    longitude <- as.numeric(gsub(".*=", "", lines[w]))
    w <- grep("^Latitude=", lines)
    latitude <- as.numeric(gsub(".*=", "", lines[w]))
    w <- grep("^Columns=", lines)
    col.names <- strsplit(gsub(".*=", "", lines[w]), ",")[[1]]
    col.names
    w <- grep("\\[MeasurementData\\]", lines)
    data <- read.csv(file, skip = w + 1, col.names = col.names)
    as.ctd(
        salinity = data$Salinity, temperature = data$Temperature,
        pressure = data$Pressure, latitude = latitude, longitude = longitude
    )
}

ctd <- testing("~/Downloads/2024-09-04_11-02-33.csv")
plot(ctd)

The graph it makes is below. There is a problem with a few of the initial salinity values, as is made obvious by the

Screenshot 2024-09-20 at 7 35 16 AM
m4rkus84 commented 1 month ago

Some questions for @m4rkus84 --

  1. Are you set up to build packages from source? If so, some changes could be made to oce to read your file. (They won't get into the CRAN version for maybe 6 months, though, because CRAN does not permit frequent updates.)
  2. Do you have documentation for your instrument that includes a specification of the format? If so, please email it to me at dan.kelley@dal.ca (I will also supply it to @richardsc via private email).

The second point is important. It is very time-consuming, and usually somewhat error-prone, to guess formats based on a file.

  1. No, I'm not set up anything. I am a simple R user and I have installed oce attracted by its very beautiful and clear plots simply using install.packages command in R.
  2. At the moment unfortunately I do NOT have any documentation and, probably, as you suggested, Sailfish has modified the export format compared to Seacast, the legacy software that was used years ago. Therefore, maybe, if you can, you could add a function in the oce package to load this new format into R. I can try to contact AML, but I'm not at all sure they will provide me with the output format specifications.
m4rkus84 commented 1 month ago

Here's a very crude test code. It skips a lot of the data (and almost all of the metadata) and it doesn't do checks (like whether there is one, and only one, longitude line). But it at least is a start. I'll be offline for a while now.

# VERY minimal reading of new (?) AML ctd format
library(oce)

testing <- function(file) {
    lines <- readLines(file)
    w <- grep("^Longitude=", lines)
    longitude <- as.numeric(gsub(".*=", "", lines[w]))
    w <- grep("^Latitude=", lines)
    latitude <- as.numeric(gsub(".*=", "", lines[w]))
    w <- grep("^Columns=", lines)
    col.names <- strsplit(gsub(".*=", "", lines[w]), ",")[[1]]
    col.names
    w <- grep("\\[MeasurementData\\]", lines)
    data <- read.csv(file, skip = w + 1, col.names = col.names)
    as.ctd(
        salinity = data$Salinity, temperature = data$Temperature,
        pressure = data$Pressure, latitude = latitude, longitude = longitude
    )
}

ctd <- testing("~/Downloads/2024-09-04_11-02-33.csv")
plot(ctd)

The graph it makes is below. There is a problem with a few of the initial salinity values, as is made obvious by the

Screenshot 2024-09-20 at 7 35 16 AM

That seems like a great improvement; thank you for your help! For now, I'll try to use this function to handle my data. If you need additional sample files to improve the import of the 'new' AML files into oce, let me know. Thank you!

dankelley commented 1 month ago

Maybe you can email me some sample files (email address in a previous comment). Be aware that anything you post here can be seen by anybody. I don't want someone grabbing your data and publishing it before you can do so.

I am going to improve the function a little at a time. You will be able to find it in another repo, https://github.com/dankelley/oce-development/tree/main/aml, and that's where to look for an update I just made.

Now, the code has a better (still temporary) name, and it reads all the data (but only lon and lat from the metadata). It does not read units but that should not be too hard and you'll know the units anyway, since you know the dataset (or can just look in a file).

Below are the graphs made by my sample program. Note that the first has a problem with low S values. The I make a histogram to get an idea of that. Then I subset to just the meaningful data. The final result might be reasonable. Perhaps you can look and comment here? (Don't comment on that other repo ... we like to keep issues here on this repo.)

PS. it's interesting to see those high salinities. I guess it makes sense, for that location! All the best from Nova Scotia.

newame_1

newame_2

newame_3

dankelley commented 1 month ago

@m4rkus84 I've pushed a new version (from now on, I won't keep copying and pasting the URL for where the code lives -- I'll let you bookmark it).

It reads units now.

HOWEVER the units are not trustworthy. The file has the density unit as kg/cm^3 but it is clearly kg/m^3. And the sound speed (which is what svcalculated is, I think, from the value of about 1500) is given as mS/cm but it should be m/s.

So, I advise being careful on this file.

Question for @m4rkus84 -- did you edit this .csv file, or is it straight from the machine? If the latter, and if my new test function goes into oce, then the documentation will need to point out the error in the format. I've been around for a long time, and when a manufacturer makes an error in a data format, it makes me think that the format is not well thought-out, and therefore might change rapidly over time. This suggests not supporting the format in oce because code will have to be repeatedly rewritten. (@richardsc may have some comments to add on this topic.)

m4rkus84 commented 1 month ago

Thank you @dankelley The CSVs are taken directly from the probe, I haven't modified them, precisely to avoid creating further messes. Indeed, you're right, those measurement unit errors are very strange coming from a company that should have a very clear understanding in the field!

dankelley commented 1 month ago

I've updated read.ctd.aml() in the "develop" branch (commit 8cae7162a7cf1fed7afa2da7c1cfee919021e26e) to auto-detect this file type, and to read it. An example is below, using a data file that I created using the first 5 observations in the file posted at the start of this thread by @m4rkus84.

I added that file so that the test suite could check for consistency. This is done a lot in oce, as in other packages, to catch problems that might come up with code changes.

FIRST NOTE to @m4rkus84 -- if you object to this data snippet being put into oce, please let me know and I'll remove it. (I inserted it to get it into the test suite, and to document it also. Only 5 lines are shown -- and your file is already public since you put it in an issue -- so I am hoping this will not violate the privacy of your data.)

SECOND NOTE to @m4rkus84 -- if you think that other code (in the other repo, mentioned previously) works, please comment and close this issue. We use open issues a a "to do" list in the oce project.

library(oce)
#> Loading required package: gsw
f <- system.file("extdata", "ctd_aml_type3.csv.gz", package = "oce")
d <- read.ctd.aml(f)
d
#> ctd object, from file "/Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/library/oce/extdata/ctd_aml_type3.csv.gz", with data slot containing:
#>    scan[1:5]: 1, 2, ..., 4, 5
#>    salinity[1:5]: 37.797, 37.907, ..., 37.920, 37.932
#>    temperature[1:5]: 25.871, 25.890, ..., 25.862, 25.838
#>    pressure[1:5]: 0.00, 0.19, ..., 0.68, 0.90
#>    conductivity[1:5]: 57.791, 57.961, ..., 57.947, 57.937
#>    battery[1:5]: 8.07, 8.07, ..., 8.07, 8.07
#>    density[1:5]: 1025.2, 1025.3, ..., 1025.3, 1025.3
#>    svcalculated[1:5]: 1539.5, 1539.6, ..., 1539.6, 1539.6
#>    depth[1:5]: 0.00, 0.19, ..., 0.67, 0.89
#>    time[1:5]: 2024-09-04 11:02:35.02, 2024-09-04 11:02:35.52, ..., 2024-09-04 11:02:36.52, 2024-09-04 11:02:37.02
summary(d)
#> CTD Summary
#> -----------
#> 
#> * File:                "/Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/library/oce/extdata/ctd_aml_type3.csv.gz"
#> * Mean Location:       43.818N 15.247E
#> * Time: 2024-09-04 11:02:35 to 2024-09-04 11:02:37 (5 samples, mean increment 0.5 s)
#> * Data Overview
#> 
#>                              Min.                Mean                Max.                Dim. NAs
#>     scan                     1                   3                   5                   5    0  
#>     salinity [PSS-78]        37.797              37.894              37.932              5    0  
#>     temperature [°C, ITS-90] 25.838              25.864              25.89               5    0  
#>     pressure [dbar]          0                   0.442               0.9                 5    0  
#>     conductivity [mS/cm]     57.791              57.915              57.961              5    0  
#>     battery [V]              8.07                8.07                8.07                5    0  
#>     density [kg/cm³]         1025.2              1025.3              1025.3              5    0  
#>     svcalculated [mS/cm]     1539.5              1539.6              1539.6              5    0  
#>     depth [m]                0                   0.438               0.89                5    0  
#>     time                     2024-09-04 11:02:35 2024-09-04 11:02:36 2024-09-04 11:02:37 5    0  
#> 
#> * Processing Log
#> 
#>     - 2024-09-21 14:57:06 UTC: `create 'ctd' object`
#>     - 2024-09-21 14:57:06 UTC: `as.ctd(salinity = data$Salinity, temperature = data$Temperature,     pressure = data$Pressure, longitude = longitude, latitude = latitude)`
#>     - 2024-09-21 14:57:06 UTC: `Add conductivity`
#>     - 2024-09-21 14:57:06 UTC: `Add battery`
#>     - 2024-09-21 14:57:06 UTC: `Add density`
#>     - 2024-09-21 14:57:06 UTC: `Add svcalculated`
#>     - 2024-09-21 14:57:06 UTC: `Add depth`
#>     - 2024-09-21 14:57:06 UTC: `Add time`

Created on 2024-09-21 with reprex v2.1.1

m4rkus84 commented 1 month ago

Hi Dan, I'm testing the code in the other repository, and everything seems to be working as expected. We can go ahead and close this thread. There's no issue with the data displayed here. Thank you.

dankelley commented 1 month ago

Thanks, @m4rkus84, for your patience and help. Dan.