Closed ashleystanek closed 2 years ago
Thanks for using oce @ashleystanek!
We can definitely take a look. Briefly, there are typically two kinds of manufacturer files that we encounter: ones that are predictable, contain suitable metadata, and are described enough that we don't need to make (dangerous) guesses about what things are; and files that are just crude exports of the data-only portion of whatever was recorded. Generally I advise against trying to adapt oce to read in the latter, because the code can be difficult to maintain and challenging to get right. What I usually recommend for the latter is a "custom read function" that the user can use to read the parts of the file into a oce object, without actually importing such a function into the package.
That being said, from a quick look at the file exports you sent, it looks like the AML files would likely fall into the former category. There is quite a bit of metadata, and it appears as though the data columns are described, with names and units.
It appears that the csv file is almost identical to the txt file, but only the txt file actually identifies the column names (in the very first line), e.g.
Date,Time,Conductivity (mS/cm),Temperature (C),Depth (m),Battery (V)
DISPLAY OPTIONS
[Instrument]
Type=Base.X2
EmulationMode=disabled
UseCustomHeader=yes
...
So, I think it's worth some discussion about whether we include a read.ctd.aml()
function in oce. If I were voting, I'd probably vote +0.75, as I think the file format looks reasonable (though not completely trivial owing to the need to grep for lots of data/metadata things), I actually have an AML that I use from time to time, however I'm not 100% confident that the txt export format won't change in the future.
It's too late for me now to try coding this up, but since I know @dankelley will be up early I've put your two data files into the oce-issues
repo at https://github.com/dankelley/oce-issues/tree/main/19xx/1924 😂
I'll be a copycat and vote +0.75 as well. (@ashleystanek, we use a voting scheme in which are numbers between -1 and +1, so that a decision is found simply by adding votes.). I wrote most of the existing oce code, so it makes sense that I look at this. I'll do so, today, perhaps before the Bedford Institute of Oceanography seminar.
I've had a look, and put a test file into the github directory that Clark mentioned. @ashleystanek you can clone that directory from git@github.com:dankelley/oce-issues.git, if you want to keep abreast of things. There's no strong need to, because I'm putting the gist below (click the Details word to see it all).
Please note that I am not yet reading any of the metadata, beyond the first line (from which I infer column names). Also, I am not parsing those column names fully, e.g. not allowing for the (small, I hope) possibility that temperature might be measured in degF or something whacky like that, or that depth not be in metres. A proper function would have checks on such things, e.g. maybe the user asked for pressure to be saved, not depth.
Ashley, if this seems promising, I can expand on this (e.g. looking into the metadata more) and rework this into a proper oce function, named (as Clark suggests) read.ctd.aml()
.
Considering that @ashleystanek is in Alaska, and you've inferred the location to be the western North Atlantic, I think you missed the lines in the file that give lon/lat:
Latitude=70.3184
Longitude=-147.7644
😄 . Or, maybe you just didn't bother trying to read that yet?
I like that the AML ctds have an integrated GPS, actually.
There's new code in the repo now, called 02dk.R (below). It reads some metadata. I think we need some advice from Ashley as to which metadata are worth reading.
I noticed that longitude and latitude appear twice, each. I guess that's the start and stop of a profile, or something. Anyway, for now, I just use the first instance of each.
I also decode the time of observation, which I guess must be the start time.
I don't see any real problem in inserting this into oce this morning, and I'll do so, unless Clark objects. (We like to vote on including things.). Of course I'd write up a bit of documentation, etc.
NOTE: I am not trying to handle the cases of different conductivity units. Nor am I trying to decode the blocks that relate to the "Slot"s. There is a whole lot of stuff in there.
I'm putting below the details, as before. It's actually almost as much work formatting this stuff as coding, so I think from now on I'll assume that Ashley and Clark will just look in the oce-issues repo for things, and that they will run 'make' there to reproduce results.
Update: 03dk.R now has a rough draft of documentation. It is ready for inclusion into oce, and I plan to do that, unless I get a "hold off" message from @richardsc before noon today, Halifax time.
Update: 04dk.R now stores the whole header in the metadata, so a user can access any of that information for themselves.
A couple quick notes while I pull the above to check:
T <- data[["Temperature..C."]]
C <- data[["Conductivity..mS.cm."]]
p <- swPressure(data[["Depth..m."]])
The header does actually specify units, so a "smart" approach would be to parse them and enter them in the [['units']]
field accordingly. But I'm not sure if the software allows export of different units, in which case there would be no point.
The other point is that depth =/= pressure, so we should be sure that we are storing the correct field in the ctd object (which prefers to have pressure, but will create it from depth if necessary).
I plan to do the following and then put the new fcn into oce before 1400h (run and lunch in meantime)
Oh, you're right about computing pressure -- I looked too quickly and didn't see the swPressure()
call. Ooops! Time for another coffee ...
I'm just out to finish that run that I couldn't complete in the snow. After that I'll get lunch and then do those changes ... but if you say "hold on" I won't incorporate it into oce.
No, fine to incorporate. We'll make changes after it's in there anyway.
Have a good run!
Quick Q: is there any pattern in the filename that I could use within oceMagic()
? I could do a combination:
.txt
Date,Time
I suppose, but this might conflict with other data files. I'm wondering whether the exporter perhaps always makes a filename starting with Custom.export
or something.Hm, maybe I ought to insist that the first line be as at 2 above, but also that the third line contains DISPLAY OPTIONS
.
NOTE: I think either Clark or Ashley might be able to answer this. Clark will know why I like to have oceMagic()
work, and also why I am quite cautious about adding new possibilities to it, for fear of breaking the recognition of existing file patterns.
PS. I would prefer to fix this stuff before incorporating because then testing is much faster.
Also ... is read.ctd.aml()
the right name, or maybe read.ctd.seacast()
or something? I know I could look through emails and do web searches to find such things but likely C can answer quickly. Offline for an hour now.
I'm not sure about the filename/file patterns, but this is something we could look into more. I have to confess that when we use our AML ctd, it isn't used as a "proper CTD" where we download the data to look at profiles and other details. It's more of a spot-check on water density related to our glider ballasting.
As for the name, I think that read.ctd.aml()
is the right approach, because it's consistent with other read.ctd*()
formats, like:
read.ctd.itp read.ctd.odv read.ctd.woce
read.ctd.odf read.ctd.sbe read.ctd.woce.other
where the suffix is either a manufacturer or organization or instrument type.
Wow! Thank you for jumping in to make this work! I glanced through the conversation but will have to take time this afternoon to try to understand it.
Regarding the text file - it is from a custom export format where you can specify the column order, and I believe you can have different units for the variables as well. I'll dig into this this afternoon and clarify some of the options. I don't have any understanding of other CTDs so maybe this is normal, but with the AML device, you can have a host of other instruments attached, so the data section is likely to be very different depending on each setup. I like the custom text format because it specifies the data columns in the first row.
I'll send along another export option, a "Seacast" csv file, which to me seems like the most obvious export option from their list. However, I didn't include it initially because it doesn't include the data column names or units. When I started bringing the data into R, I had to add them manually, which seems very odd and prone to error.
GPS - the second set of coordinates under the DataX header are the ones that should be used. I don't know what is different between them, but AML confirmed to use the second set. I had issues this summer where the second set of coordinates were blank because the device couldn't get a lock so if any function were to read them, it should be possible to add them in later manually. Yep, I'm in Alaska and this data is from our work up in the Beaufort Sea!
Thank you! Cheers, Ashley
However, I didn't include it initially because it doesn't include the data column names or units.
I think it's best to stick with formats that have those things. I'm going to add what I have to oce. To access it, you'll need to be set up to build R packages from source. The README on the oce website has some hints on that. There are web resources, too (including some written by @richardsc) ... basically, your system needs to have compilers for Fortran, C and C++. Stay tuned. It will be in oce within about half an hour.
@ashleystanek can I ask you to read through the attached (trial docs for the new function) to see if it's clear? As you can see, I've coded something that is very demanding on file formats. This can be relaxed later, but first I wanted to get something that you can try out for your actual case. Dealing with a large possibility of data types is challenging because we need to have lots of tests on things like
and so forth. Coding for such things is labour-intensive, and all the more so if there is no documentation to tell us which things are possible.
If you are ever in a pinch, you can copy the gist of what I have in read.ctd.aml()
, viz. find the "Comments:" line, and skip one more line, then read the columns. Use names as given by the first line in the file. Then do whatever you want, with as.ctd()
or some other function.
Attached is the trial documentation. It's pretty rough, but perhaps you can have a look and make edits on it. Then I can just edit the source.
Fantastic! I'll give it a try once I learn to compile the package.
Update: 06dk.R uses the "develop" branch of oce as of commit ffe5c1c4db96b41b759022e542e320586eaa42da.
@ashleystanek: you'll need to build oce from source to run this code. If you do, you'll see that it now stores times as well as temperature, etc.
A couple of other points:
oceMagic()
recognize this file because I don't know what patterns I can use to do that. Therefore, you have to call read.ctd.aml()
directly. (If oceMagic()
knows a file pattern then read.oce()
can be used. This is set up for dozens of file types. But, to set this up, I'd really need some documentation about the file format ... otherwise I'm just guessing, and the problem with that is the possibility of false positives, breaking code for other users.)@dankelley I was able to get the develop version of the package loaded from github and can read in data files with read.ctd.aml()
! I spent a bit of time today looking more through SeaCast to find some of the options that may complicate reading files, such as unit preferences and automatically converting dBar to depth. I've attached a screenshot of the main program settings and also the settings for the custom export options. As I was exporting files again the default CSV format now includes the data column headers and units. If I can figure out why it didn't work before, I would think it would be better to use a format that doesn't have the variability that comes with the custom .txt file.
I'll work through comments on the documentation and the specific formatting questions next week, and dig into some of the oce functions.
Thank you! Ashley
It would help to get precise statements here about how things change if you change those settings. The files are textual (either .txt or .csv) and so it will be easy for you to do.
Making the function handle both .txt and .csv seems like a hassle, to be honest. For example, I have a sub-function that acquires metadata, and it works by reading a line of text, finding a "=" character, taking the text to the left of that as the metadata name, and the text to the right as the metadata value. So, that function would need a new argument telling it whether the file is csv or txt, and it would have to act differently in the two cases. This is not difficult, but I just point it out as an example that handling different file formats imposes a coding burden.
The same goes for the unit possibilities. Consider the depth/pressure entry. Presumably there is something that tells whether the user selected "freshwater" or "seawater" formula. So that must be (or should be!) buried somewhere in the metadata, which is another thing to consider and have to code for. My advice: always export pressure, never depth. The instrument measures pressure, and that is always what's used in oceanographic research. I don't think we support depth in any other read.ctd
functions, for example, and no formula I'm aware of uses depth. Depth only comes up in making plots, and even then, I would not use it except in making a graph for the layperson.
The good thing is that I don't see anything in the interface about switching units of conductivity, or using [
versus (
around units. This reduces the programming burden quite a lot. Also, it doesn't look like you can make it call something "Temp" or "T" or whatever, instead of "Temperature", and that helps a lot. (For the more popular seabird instruments, a lot of oce code is dedicated to interpreting column names, which have hundreds of possibilities.)
Here are some tasks I'd like you to do, to help in this. I am hoping to cover a fair bit of ground with these, and to do so in an organized way that will not require a lot of comments back-and-forth, which are hard given our 5 timezones separation. Please create the following files, named exactly as shown, and stored in a zip file. Github lets you upload zip files to comments.
a.txt
the sample from before. Please tell me in the comment which button you clicked on depth.b.txt
as a.txt
but with the other depth style. This will let me see if there's something in the metadata that tells us.c.txt
as a.txt
but click for pressure, not depth. This will tell me what the system uses for pressure. (I am assuming (dbar)
but I need to know the string precisely, to match it. For example, if they put spaces around dbar
I will need to code to recognize that.)a.csv
as a.txt
but csv formatb.csv
as b.txt
but csv formatc.csv
as c.txt
but csv formatI think if I get these 6 files, I'll be able to code two variants, likely named read.ctd.aml.txt()
and read.ctd.aml.csv()
. I don't think there is any point in my trying to make oceMagic()
recognize these files because I fear false positives, meaning that I try to infer these aml files from some text contained within them, but that will break other user's code because there is an already-recognized type that also happens to have that string. Many manufacturers are more clever than aml, and they insert some text at the start of files that quite clearly shows what the data are. For example, if I were coding the file format, I'd start with a line like AML BaseX2 CTD
or something like that. For example, a very-popular SeaBird data format has a particular filename (ending in .cnv
) and the first line starts with the string * Sea-Bird
and so so it's pretty easy for oceMagic()
to detect that filetype, with little chance of an accidental match. These aml files do have something that I think I could use (line 5 of the .txt file you sent and line 3 of the csv file you sent) and so if I had some confidence that these would always be in these lines, I could try to make oceMagic()
work. Why is this of interest? Because, for dozens of file types for dozens of instruments, the user just has to write
d <- read.oce(FILENAME)
and it will figure out what the data are (maybe CTD, maybe ADCP,....) and then do the right thing. Then a useful plot can be created with
plot(d)
and a useful summary with
summary(d)
and so forth. Notice that these three lines do something useful regardless of the filetype. This use of "generic" functions is a real power of R.
Followup to @ashleystanek -- it would also help if you could give us permission to include those test files into the oce test suite.
If you agree, then I will trim the data to maybe 3 data points, and then the test suite can check that we can read the data. I'll also blank out any private-looking things, like IP addresses and (if any) contact info for the operator, or what-not.
The advantage of having test files is that users will be protected against accidental errors brought about by what seems like simple code changes. We need such things for code that deals with various special cases to handle 2 different file types, 3 different vertical coordinates, and so forth.
The more gets added to read.ctd.aml()
, the more we need a test suite to be sure things continue to work. To give you an idea of the scope of things, oce has about 7,000 lines of test code, which is about 10% of the code-base of about 70,000 lines of R, plus 8,000 lines of fortran/C/C++. (Test suites are one of the reasons why R packages are trusted.)
I've put in some code to auto-recognize txt files. And, on the assumption that @ashleystanek will let us insert some (trimmed) files into our test suite, I've started writing tests. Below is an example that uses just 3 lines. My proposal would be that I insert into the test suite this file (which is already semi-public by virtue of having been uploaded to github), but with only 3 data lines. Three is enough to be sure we start reading at the right spot, that we decode columns correctly, etc.
Just to repeat the point (since I think @ashleystanek might be new to this sort of thing), the idea is that if in future the code reads any of the numbers incorrectly, then R won't pass build-test suites. That means that (at least on the tested files) oce cannot "slide back" from correct functioning. It is a sort of safeguard against introducing errors by recoding.
I don't have a test for voltage, because I'm not including voltage in the dataset. I might do that, and if I do, then I'll add a test. But the first concern is to get some test files, as requested in https://github.com/dankelley/oce/issues/1924#issuecomment-1059744336 above. I want those so I can fiddle with the code to do things like decide what to about 3 possible choices for vertical coordinate, etc.
# Preparation for tests within oce tests/testthat/test_ctd.R
# Requires local sources *or* an up-to-date oce from "develop".
library(oce)
library(testthat)
if (file.exists("~/git/oce/R/ctd.aml.R")) {
source("~/git/oce/R/ctd.aml.R")
source("~/git/oce/R/oce.R")
}
file <- "Custom.export.026043_2021-07-21_17-36-45.txt"
expect_equal("aml/txt", oceMagic(file))
ctd <- read.oce(file)
expect_equal(head(ctd[["temperature"]], 3),
c(5.867, 5.986, 6.058))
expect_equal(head(ctd[["salinity"]], 3),
c(2.61918633678843, 5.27124897467692, 8.10531077140948))
expect_equal(head(ctd[["pressure"]], 3),
c(0.21171839076346, 0.252045727972423, 0.252045727972423))
expect_equal(head(ctd[["conductivity"]], 3),
c(3.107, 6.01, 8.992))
expect_equal(head(ctd[["time"]], 3),
as.POSIXct(c("2021-07-21 17:36:46.17", "2021-07-21 17:36:46.21", "2021-07-21 17:36:46.25"),
tz="UTC"))
I would need some convincing, to support the csv format.
Below is a snapshot of the two test files I have so far. The txt format looks superior to me, as I think it did to @richardsc. Here are some reasons:
Comments:
but the csv one does not. That seems like it could be a problem in some cases.2. PS. the ^I
characters are tabs. I think the GUI permits using tabs or other schemes, so that's yet-another aspect that the code will have to handle (which I think it does already, because I use trimws
to clean up metadata).
After discussion with @richardsc (who is amassing some sample files) I've decided that I cannot find the start of data by looking for a
Comments:
line. My new plan is to count the number of commas in the first line and then scan down until I find another line with that number of commas. Whether this will be robust is anybody's guess. But we are not seeking robustness here, but rather something that will work under restricted conditions that will be detailed in the docs.
I'm fiddling with this count-the-commas method. It works on the following test files (in https://github.com/dankelley/oce-issues/tree/main/19xx/1924)
clarks_data/Custom 025430_2022-02-23_16-18-35_export_allfields_noheader.txt
clarks_data/Custom 025430_2022-02-23_16-18-35_export_csv_allfields.txt
clarks_data/Custom 025430_2022-02-23_16-18-35_export_depth.txt
./Custom.export.026043_2021-07-21_17-36-45.txt
based on the test code 10dk.R
in that directory. (This code finds files that meet the demands of the present-moment read.ctd.aml()
function.)
For fun, below is a snapshot for the first file in the list given in the previous comment. I'm graphing the difference between salinity in the file and salinity I compute from conductivity etc in the file. I'm showing it as a function of pressure.
I am inclined to retain the salinity from the file, with name "Salinity (PSU)"
, but to compute salinity, with name "salinity"
. I'm not sure if I like that scheme, though. I think we usually just discard salinities from files, on the assumption that they might be computed incorrectly, and that it's best to compute from the things actually measured, which will yield self-consistent results.
Anyway, what you see is that mostly the differences are less than 0.001 units, which makes sense because the reported salinity has 3 digits after the decimal place. (We have no way of knowing whether a manufacturer would round up or down, or just truncate.)
There are some higher differences over on the left of the graph. I'm not too sure what to make of those. My snapshot shows the data and the graph, as an extra clue. Maybe this is a result of the fact that conductivities are so low in the upper waters, so that last-digit-rounding issues are causing the scatter. I'm not going to bother with that for now, though, since I do prefer to use our own formulae for S=S(C,T,p) and so forth, given that we have many-digit tests for them in our check suite.
Below is the density difference (computed with oce, vs stored in file). The file is named at the top of the plot. Again, I don't see any reason for further action, since the (trusted) oce values are not much different from the values in the file. I might try adding 1 to the final digits of some base quantities, to see what changes that makes.
These tests show that altering the last-reported (i.e. 3rd) digit of C can alter the 3rd digit of S, depending on rounding. I am not seeing how to get a change of 0.005 in these tests, though.
> swSCTp(43.414,21.233,0.84,conductivityUnit="mS/cm")
[1] 30.44289
> swSCTp(43.414+0.001,21.233,0.84,conductivityUnit="mS/cm")
[1] 30.44367
> swSCTp(0.985,21.231,0.80,conductivityUnit="mS/cm")
[1] 0.5269657
> swSCTp(0.985+0.001,21.231,0.80,conductivityUnit="mS/cm")
[1] 0.5275216
Hm, this does seem a bit odd. 0.005 is pretty large in the grand scheme of things ... I'll take a look (but should maybe do it in a new issue)
@richardsc I agree that this 0.005 disagreement ought to be in a new issue. I got a bit lost in the files, to be honest.
@ashleystanek I'm not too sure of the advantage of csv over txt. @richardsc might have some ideas, since I know he has produced a bunch of files with different settings. I notice that none of his csv test files has header information. (Pull the github/dankelley/oce-issues repo to see his files.)
I'm waiting to hear back on sharing the data file (I'm sure it wont be an issue) and will upload the corresponding files for these notes (and the comparison of text and csv files you requested earlier) at that time, but I wanted to send out this information in the meantime.
There are three files that we should consider as options for oce to read. Given what I've learned after working through this software and your questions, I'd vote for option 2. the exported csv file, but you guys are the experts so I'll leave the decision to you.
1. Original CSV - The original cast file that is downloaded to the computer upon connecting the ctd to the computer and viewing the cast in Seacast.
The main reason I would not suggest this file is that I cannot open it in Seacast later. I have to connect the ctd to the computer and load the file from there directly if I want it open in Seacast (I can view it fine in excel or a text editor). Consequently, if I ever want to view it in Seacast later, I have to save the file as another format regardless.
Whether there is a depth or pressure column depends on the main Seacast settings - if Convert dBar to depth is checked or not. Given your recommendations, I'll leave this turned off off in my program settings so that Pressure (dbar) is reported.
This was sort of what I sent you initially, except that I had opened the csv in excel and removed a chunk of the data rows. That turned out to be a mistake because it Americanized the date format and added all those commas. If I re-download this file from the ctd to my computer, the date is formatted as yyyy-mm-dd.
2. Exported CSV - the file that is generated when selecting Export As... Seacast (.csv).
I can open this file in Seacast later, if needed.
Given that there are fewer options to change, it seems harder to make errors with the formatting of this file.
The data table has a row with [data] followed by the data column names and units, and then the data. To me, this seems much easier to parse and find the data for oce to use.
Pressure vs depth column is the same as above, it depends on the main settings as above.
Date format remains in yyyy-mm-dd so long as it isn't saved in excel with a different format.
The files I was working with from our data collection this summer were exported this way, except that they were missing the column names and units. I still don't know why, but I have not been able to replicate this problem now. This was the reason I didn't send this file in my initial inquiry.
The labels in the header are in all lowercase.
3. Exported .txt - This is generated when selecting Export As... Custom
Can not be opened in seacast later.
There are options for the delimiter, order of columns, whether or not the header is present, and if the column names are in the first row.
There is no row with [data] and the column names and units are in the first row of the file (if turned on in the export settings), not the row above the dataset.
All the labels in the header are in CamelCase
Other notes:
It looks like I can add columns for salinity and density by changing the instrument settings. If they were present in the dataset, would they override the ones you calculate in making the data file a ctd object? I just saw your conversation about the test file that calculated salinity and density and getting different answers from oce.
Regarding the unit formatting, the headers seem to always be in the same format regardless of the filetype and settings that I've seen so far: Temperature is spelled with a "T", units are with "()", None of the units I have seen use any accents, but if there is a French version of the software given that it is a Canadian company, there is a potential but I haven't seen that choice. The units for conductivity are "(mS/cm)", in data files without a depth column there is a pressure column which is in "(dBar)". I haven't come across a file yet that has both pressure and depth included.
Thanks @ashleystanek. Q: for your preferred option (number 2), could you maybe send me a private email with a sample file? (I ask because we have so many hours between our zones that tests are slow.). I won't be able to look at this on Thu and likely not on Fri either.
The reason I want to see the file is to see the format in the [data]
block. The sample files I have so far, from you and from @richardsc, don't have the column names in the [data]
block. Regardless of any format's other merits, if we cannot infer the column names, we are lost.
Re your Q If they were present in the dataset, would they override the ones you calculate in making the data file a ctd object? the answer is "no". But the user will be able to obtain salinity as in the file, if they want. For example
library(oce)
d <- read.ctd.aml("some file name")
plot(d[["salinity"]], d[["Salinity..PSU."]])
what's happening for the plot is that the x axis will be salinity as calculated by oce and the y axis will be salinity as in the data file. When naming columns, R changes odd characters (in this case, space and parentheses) into "." characters.
In the above, you are also seeing here the use of [[
as defined by oce, which has the approx. meaning "try to find the named item in the object". The cool thing is that [[
is not a direct lookup, for it will also compute things, if they can be inferred from the object's contents. For example, you could do d[["sound speed"]]
and it would use a formula to compute that. If you load up an object of named d
, try d[["?"]]
to see what you can access, and then try names(d@data)
to see the smaller list of what's actually stored in the object.
I'm not sure if you already know all this stuff, so I won't get too deep into the weeds.
PS. on file sharing permission, I'm pretty sure I can get @richardsc to volunteer a file snippet. For testing -- which is why I want this -- we only need maybe 10 data points; more just wastes space in the package.
I can give permission to can go ahead and use the files I sent for posting here and documentation as would be useful. Feel free to trim them as necessary.
Thanks @ashleystanek. I have 2 more favours to ask:
I'm pressing things a bit because you seem to be online at this moment, and the faster I can settle the code, the faster you (and others) will have an oce function that is helpful.
Thanks! I haven't actually had a chance to learn how to actually use oce since we've started this discussion. Now that I can get some data read in I'm looking forward to digging into it.
My name can be there but no need for a special note about it. But you are correct, this cast/dataset isn't meaningful as is, and hasn't been checked for accuracy (it is just being used to check for the formatting, not the content).
Below are screenshots of the instrument and program settings within Seacast. If I eventually find other options that change how the seacast csv is formatted I'll be sure to let you know. I will note that there is another csv format in the export options, as a QINSy format. It isn't the same and shouldn't be used for importing to oce. The manual for seacast is available within the program and at https://www.subseatechnologies.com/files/800/ and it describes some of the settings and formats available.
My last note for the moment is regarding the coordinates - the second set of coordinates under the [Data.x] header are the ones that should be used. This summer we had a bunch that didn't get a lock on the location, but it looks like it should be straight forward to assign coordinates to a ctd object manually. If coordinates weren't recorded, the field says "no-lock".
.
Q for @ashleystanek you said that in your "option 2" file, the column names are in lower-case. I don't see that, in the file you sent. I see as in the screenshot below. Can you clarify?
PS. I know this seems like a detail, but it's definitely not, because oce is set up for precise checks on things, e.g. at the code linked to below, I'm checking for "Date"
as a precise match. Precise matches are very helpful in complex data files (although this data file is not complex, of course).
https://github.com/dankelley/oce/blob/d4f0e2782c574e400a977783ef1e39ade7020b8d/R/ctd.aml.R#L121
Re "no-lock" on coordinates: I will code so that if there are no sensible coordinates, a NA
is saved. In R, that means a missing value. With that set, some computations will not work (because the new equation of state requires location to compute density, etc... it's too complicated to explain here though).
That was my mistake, all the labels for the header content are in lowercase as opposed to CamelCase in the text file. The header appears to remain consistent as you've shown across the different file types format. Thank you for checking.
I've written code to read either Ashleys "type 2" format or one of the .txt formats from Clark.
I have also constructed a sample file (after trimming Ashley's file, and also zeroing out the IP address, serial numbers, and the WEP code).
My next step will be to add tests for this sample file. This is a very important step because it "freezes" functionality. That means that further tweaks to the code will be required to be backwards-compatible with this file format.
These things will get pushed to GH early this afternoon.
You'll be able to learn about the dataset with
?ctd_aml.csv
@ashleystanek please click 'Details' below to see a snapshot of the ?ctd_aml.csv
docs, to see if what I've written seems OK. Note that I have zeroed out the IP address, the WEP code, and the serial numbers. Users don't need to know those things, and I don't want anybody trying to hack into your instrument.
I have pushed to github, in "develop" commit 58331bc1ed86a1e6805f7e01f754c2a1c40e85ec. I've started some test builds but I won't know the results for a while since I have a meeting coming up.
I ask that @richardsc and @ashleystanek take a look at the docs for ?read.ctd.aml
to see whether they seem to describe the format correctly. (Actually, the whole doc is only a page or so, and it would be great if you could read the whole thing.). What I want is that users will see how to set up their AML/Seacast software to generate the right sort of data.
PS. I just clicked on https://www.subseatechnologies.com/media/files/page/032e50ac/seacast-4-2-user-manual-sti.pdf and I see that it's called SeaCast
and not Seacast
, so I'll modify that throughout. I also plan to at least skim that manual this afternoon. I want to see if my assumptions on things like the case of "Longitude" vs "latitude" is proper, relative to the format
parameter. Right now, the code is demanding that things be as I've seen in the sample files I have available, but that's a poor plan in general.
@ashleystanek and @richardsc I see (click Details for a screen snapshot of the manual) that the AML docs say the words are lat
and lon
, not the full-word-form that I'm seeing in our data files.
I plan to make the code accept either short or long, and either all lower-case or title-case.
@ashleystanek and @richardsc -- I think I have this working now, in the "develop" branch. My new test code at https://github.com/dankelley/oce-issues/blob/main/19xx/1924/12dk.R runs some files from both of you.
I'd be interested to hear whether this version works for practical applications. And, of course, I am keen to know whether my docs make sense with respect to the settings to use in the AML software.
By the way, Clark, one of your data files is stating location as just off the coast of Florida. On spring break, buddy?
PS. I will not see emails tomorrow but should be back online over the weekend.
Hello Dr. Kelley and Dr. Richards, I am trying to load data from my ctd into oce but am running into some issues getting through the first step.
I have a BaseX2 from AML Oceanographic with temperature, pressure, and conductivity sensors. Using the software that comes with the instrument (SeaCast) I can export the data in several formats, but when trying to import them using read.oce or read.ctd, I receive an error saying the filetype is "unknown" and I can't find any mention of the filetypes I can create in the documentation for oce. I can export to the following formats: 1) a csv that includes the same header as in the attached file attached but with the data columns in any order, 2) PDS2000 (.txt) 3) Kongsberg (.asvp) 4) CARIS (.svp), 5) HYPAK (.vel) 6) HYPAK 2015 (.vel) 7) HiPap (.usr) 8) Sonardyne (.pro) 9) QINSy (.csv).
I have attached two file types of the same dataset (I've removed a chunk of the rows so it doesn't contain the whole cast), but run into the same issue with both.
Custom export 026043_2021-07-21_17-36-45.txt Exported format 026043_2021-07-21_17-36-45.csv
Output from sessionInfo(): R version 4.1.2 (2021-11-01) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19042)
Matrix products: default
locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C LC_TIME=English_United States.1252
system code page: 65001
attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages: [1] oce_1.5-0 gsw_1.0-6
loaded via a namespace (and not attached): [1] compiler_4.1.2 tools_4.1.2 Rcpp_1.0.8
Thank you for the help, Ashley