Reading file from BaseX2 CTD

ashleystanek commented 2 years ago

Hello Dr. Kelley and Dr. Richards, I am trying to load data from my ctd into oce but am running into some issues getting through the first step.

I have a BaseX2 from AML Oceanographic with temperature, pressure, and conductivity sensors. Using the software that comes with the instrument (SeaCast) I can export the data in several formats, but when trying to import them using read.oce or read.ctd, I receive an error saying the filetype is "unknown" and I can't find any mention of the filetypes I can create in the documentation for oce. I can export to the following formats: 1) a csv that includes the same header as in the attached file attached but with the data columns in any order, 2) PDS2000 (.txt) 3) Kongsberg (.asvp) 4) CARIS (.svp), 5) HYPAK (.vel) 6) HYPAK 2015 (.vel) 7) HiPap (.usr) 8) Sonardyne (.pro) 9) QINSy (.csv).

I have attached two file types of the same dataset (I've removed a chunk of the rows so it doesn't contain the whole cast), but run into the same issue with both.
Custom export 026043_2021-07-21_17-36-45.txt Exported format 026043_2021-07-21_17-36-45.csv

library(oce)

filefolder <- ""  # Add filepath here

# These first two lines read the file that was on the instrument (Original 
# format):
# When I call read.oce I receive an error saying the filetype is "unknown"
read.oce(paste0(filefolder, "Original format 026043_2021-07-21_17-36-45.csv"))

# When trying read.ctd it says it cannot determine the file type in the first row
read.ctd(paste0(filefolder, "Original format 026043_2021-07-21_17-36-45.csv"))

# These next lines read the file after the data was read by the software used
# by the instrument, SeaCast, and then exported as a .txt (Exported format):
read.oce(paste0(filefolder, "Custom export 026043_2021-07-21_17-36-45.txt"))
read.ctd(paste0(filefolder, "Custom export 026043_2021-07-21_17-36-45.txt"))

Output from sessionInfo(): R version 4.1.2 (2021-11-01) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C LC_TIME=English_United States.1252
system code page: 65001

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] oce_1.5-0 gsw_1.0-6

loaded via a namespace (and not attached): [1] compiler_4.1.2 tools_4.1.2 Rcpp_1.0.8

Thank you for the help, Ashley

richardsc commented 2 years ago

Haha, I saw those lon/lat in the files and wondered where they came from. Must be some kind of a default? I don't think that thing has ever been turned on outside, as we mostly use it in an indoor tank

ashleystanek commented 2 years ago

I went through the documentation for the sample dataset and for read.ctd.aml() and most everything looks good to me.

In the reference for the SeaCast user manual, it should read "May" not "Mahy".
In the notes about read.ctd.aml() here https://github.com/dankelley/oce/commit/58331bc1ed86a1e6805f7e01f754c2a1c40e85ec , it says that Latitude and Longitude are required (I'm assuming in the header). I tried to read in a file that didn't have coordinates in the [data.x] section expecting an error, but it read in fine and instead used the coordinates listed in the first header section (which happen to be on Victoria Island, I think that is where the device was built).
If coordinates are required, is it possible to include them in read.ctd.aml() when reading in the data? In other code I had initially written up to read in this data (separate from oce), I could assign the coordinates I had written down at the time of deployment, based on the filename. I think you said other ctds don't automatically record coordinates, do they also require coordinates when reading in the data? I suspect I'll be able to find some of the answers to assign coordinates manually once I dig in further to oce.

I'll try to find somebody else who will also be using our ctd, and see if they can follow the documentation too.

dankelley commented 2 years ago

Hi @ashleystanek (and, for a question at the end, @richardsc).

I'll number things, for cross-reference.

You are referring to an old version of the docs, which became out of date quite a while ago. (All these timezones make it hard.). I suggest that you always do tests against the very latest version of oce. If you're just interested in reading the docs, you can look at https://github.com/dankelley/oce/blob/develop/man/read.ctd.aml.Rd . This is in the R documentation format, but it's pretty easy to read, e.g. \code{something} meaning to use typewriter font for that word, etc. You can just skip markup like that. I think it will be easy for you to understand. The core text is of shorter length than what I'm likely to type in this comment, so you shouldn't find it too hard to suggest changes, in a way like "In line 17: change 'specifying' to 'that specifies'" or whatever. If you suggest changes like that, please write each on a separate line (with a blank line after) so I can keep track.
I am not sure on the best scheme regarding reading longitude and latitude. Can you help? I am looking for a suggestion as to what to save (first or second value), in the following cases: (a) first value is a valid number, second no-lock, (b) first no-lock, second valid, and (c) both valid. (I can code NA, missing value, if both are no-lock.)
Yes, I can add arguments to the function to provide the longitude and latitude. You can also do that later, if you wish. This is what people normally do for CTD data. A way to do that is like below.

d <- read.ctd(...)
d <- oceSetMetadata(d, "latitude", 45)
d <- oceSetMetadata(d, "longitude", -60)

and that is basically what I could add at the bottom of read.ctd.aml(). I didn't code this in because other read.ctd* functions do not have longitude and latitude as arguments ... and I am a bit divided on the merit of making this function be different, to be honest ... any thoughts, @richardsc?

richardsc commented 2 years ago

The "dual position" thing for the header is a bit of a pain, to be honest. Did we ever find good documentation about what the first means vs the second, and which should be preferred?

Honestly, my advice is to do as @dankelley suggests, and if you have reason to believe that the positions recorded by the instrument are no good (or if it didn't record any), then do just follow the read.ctd.aml() call with a paired oceSetMetadata() call to set them yourself.

I'm not sure how many casts you'll be doing at a time, but I have found a good way to process a whole bunch of casts at once is to write the extra metadata that you want/need into a separate CSV file, with one row for each file, and then you can loop through them all, read in the files, add the metadata, and resave as necessary.

ashleystanek commented 2 years ago

I asked AML about this issue this summer and was told that the second set of coordinates are the ones that should be used, the ones in the [data.x] section. The person who helped me could not confirm what the coordinates that are listed first reflect, and I haven't been able to find a pattern yet.

Assigning the coordinates or other metadata manually as above sounds very manageable. I agree that read.ctd.aml() shouldn't have specific lat/long arguments options if that isn't how your other functions work.

richardsc commented 2 years ago

As a further example, if you knew you were going to have to read in a bunch of files, and you wanted to be able to pass the lon/lat at read time, you could just write your own read function, something like:

my_read.ctd.aml <- function(file, lon, lat) {
    require(oce)
    d <- read.ctd.aml(file)
    d <- oceSetMetadata(d, "latitude", lat)
    d <- oceSetMetadata(d, "longitude", lon)
    return(d)
}

and then call it like:

ctd_with_position <- my_read.ctd.aml('ctdfilename.txt', lon=-60, lat=45)

dankelley commented 2 years ago

Thanks, @ashleystanek, for the note about locations.

I will update the code to use the second location, since previously I was taking the first. I think maybe I had seen a file with nearly-equal first and second locations, and then another file where the second was no-lock and so I guessed that they would be equivalent but that the second might fail for some reason (maybe not out of the water long enough, for example).

dankelley commented 2 years ago

While I wait the half hour it takes my machine to test the package (not yet pushed to GH), an extension of Clarks approach is to make a csv file that has columns for filename, lon and lat. Then do e.g.

info <- read.csv("info.csv")
n <- length(info$filename)
d <- vector("list", n)
for (i in seq_len(info$filename)) {
  d[[i]] <- my_read.ctd.aml(info$filename[i], info$lon[i], info$lat[i])
}

after which you have e.g. ctd[[1]] holding the first CTD object, ctd[[2]] holding the second, and so forth.

dankelley commented 2 years ago

I've pushed the new source to GH. Ready for any more testing anybody wants. @ashleystanek if you think this has solved the issue as stated in the title, please consider closing the issue. If you get more issues, just open a new issue with a narrower title.

I'll be offline for a day.

ashleystanek commented 2 years ago

Thank you both very much for your time to make this work! I'll start another issue once I learn enough to have more questions.

dankelley / oce

Reading file from BaseX2 CTD #1924