dankelley / oce

R package for oceanographic processing
http://dankelley.github.io/oce/
GNU General Public License v3.0
142 stars 42 forks source link

read.oce() should handle ruskin (RBR) data files #562

Closed dankelley closed 9 years ago

dankelley commented 9 years ago

I downloaded the ruskin matlab toolbox from http://www.rbr-global.com/images/software/RSKtools/RSKtools.zip and ran the following

library(oce)
read.oce("~/Downloads/RSKtools/sample.rsk")

but it yielded the error

Error in `[.data.frame`(d, , pressureColumn) : undefined columns selected
In addition: Warning messages:
1: In read.tdr(file, processingLog = processingLog, type = "rsk") :
  making some (untested) assumptions, since the ruskin Version (1.7.17) is outside the range for which tests have been done
2: In read.tdr(file, processingLog = processingLog, type = "rsk") :
  cannot locate pressure column in 'channels' table of database; assuming column 3
dankelley commented 9 years ago

This is working a bit better now, in the gsw branch (sorry, wrong branch, I know but that's where I'm "living" lately).

The code now decides to call this a tdr data type. But I don't really know if that's the right object class in which to shove this. @richardsc -- any ideas on that? I don't feel a need to invent a new "temperature recorder" class, if the "pressure-temperature" class can handle it.

dankelley commented 9 years ago

An update with my thoughts as of now (after private emails and phonecalls).

The existing name tdr (for a data type, and in various generic functions) seems too tied to a very particular instrument type, and it's not the right name for closely related instruments (the temperature-depth-recorder from RBR).

It is not completely clear what the best name for a larger type (i.e. one including the "TDR" sort of data, plus similar data by RBR and other manufacturers) should be. I do think logger is a pretty good name. That's what people tend to say, and it seems to be the general name used by RBR nowadays.

Since Oce is major version 0, and is getting tightened up for the release of OAR, I feel that it's OK to alter the name of the data type. Doing so should not affect old code except if it uses as.tdr, and a conversion to the new type could be done by the single line of code as.tdr<-as.logger in a source-code file or in the global Rprofile file. Therefore I judge the cost as payable, given the benefit.

Specialization can be achieved by subtypes, as we already have for ctd, adp, etc.

As a practical matter, the first step would seem to be acquiring data files from various instruments manufactured by RBR, because they seem to be really very nice and I would like to be prepared to analyze their data in case we get some for upcoming fieldwork.

dankelley commented 9 years ago

Jeeze, I wish these comments could be threaded. Anyway, I'm putting below some notes. I will add to this comment later, perhaps, to avoid the important one just above it from getting hidden several screenfuls from where a reader might be looking. At present

library(oce)
d<-read.oce("~/Downloads/RSKtools/sample.rsk")
str(d)

yields as follows:

Formal class 'tdr' [package "oce"] with 3 slots
  ..@ metadata     :List of 5
  .. ..$ filename           : chr "~/Downloads/RSKtools/sample.rsk"
  .. ..$ instrumentType     : chr "rbr"
  .. ..$ model              : chr "RBRsolo"
  .. ..$ serialNumber       : chr "75766"
  .. ..$ pressureAtmospheric: num 10.1
  ..@ data         :List of 3
  .. ..$ time       : POSIXct[1:4225], format: "2013-02-26 12:00:00" "2013-02-26 12:00:10" "2013-02-26 12:00:20" "2013-02-26 12:00:30" ...
  .. ..$ pressure   : logi [1:4225] NA NA NA NA NA NA ...
  .. ..$ temperature: num [1:4225] 16.5 16.5 16.5 16.5 16.5 ...
  ..@ processingLog:List of 2
  .. ..$ time : POSIXct[1:2], format: "2015-01-10 15:57:32.106" "2015-01-10 15:57:32.106"
  .. ..$ value: chr [1:2] "create 'tdr' object" "read.tdr(file = file, type = \"rsk\", processingLog = processingLog)"
richardsc commented 9 years ago

I like the idea of switching to logger as a name. This means that similar instruments from other manufacturers could also use such a class.

Is there a good way of specifying what kind of logger they are? Perhaps as a sub-class or something? Like:

d <- read.oce('file.rsk)
class(d)
[1] "logger" "tdr"
attr(,"package")
[1] "oce"

Does that make sense?

dankelley commented 9 years ago

Yes, there can be multple classes, like times in R. However, what I’ve been doing instead is using “type”, e.g.

> data(ctd)
> ctd[["type"]]
[1] “SBE"

and so that kind of thing is buried in a lot of code.

I wish I could get a hold of more ruskin files. They only provide one on their website. I imagine the others will be similar, but I’ve not found a doc stating formats, so that’s just guessing. For example, right now

$ sqlite3 050046_20111007_1413.rsk "select * from instruments”

one a file on my box produces

50046|RBRduo

i.e. first item serial number and second instrument type. Will that always be true? Not sure.

Looking at a few different files might give ideas on other things, too. For example, I think RBR have something that’s basically a ctd. That yields a question — should it be stored in a ctd object, or a logger one?

A block on all of this is that I’m not sure what a logger really is. Not just something that stores time — almost everything does that. Maybe it’s something that stores time plus water properties (to avoid having an ADV or ADP in the list). Hm, well it could have pressure, which is not a water property. And so now we get to the CTD. Hm. I think I need to do a survey of friends to see what the word “logger” means to them.

richardsc commented 9 years ago

I have a few different ones (CTDs, mostly, but also a pressure logger), and in a few weeks I suspect I'll easily be able to get you sample files for just about every instrument. Can you wait that long?

As for the RBR CTD, yes, they have one. In fact, there are a few different variants, mostly just distinguished by the number of extra sensors. I definitely think such instruments should have class ctd, as opposed to logger.

My question about sub-type was related more to general methods. For example, say you have two logger objects -- one from a "temperature-depth-recorder" (logger1) and one from just a plain old "pressure-recorder" (logger2). What happens when you do

plot(logger1)

vs when you do

plot(logger2)

?

dankelley commented 9 years ago

That's basically why I use type -- the (generic) plotting code just has to check on that value, to decide what to do. AFAIK, generics work just on the main class. But R has three types of classes, and I think the one called "reference class" might permit extra dispatching based on examining the whole class list.

The type scheme would be pretty simple. To answer your question directly, it would be e.g.

setMethod(f="plot",
          signature=signature("logger"),
          definition=function(x, ...) {
              ...
              type <- x[["type"]]
              if (type == "pressure-recorder") {
                   ...
             } else if (type == "temperature-pressure-recorder") {
                   ...

and so forth.

dankelley commented 9 years ago

This works on RBR files of the following types, so I'm closing the issue.