dankelley / oce

R package for oceanographic processing
http://dankelley.github.io/oce/
GNU General Public License v3.0
142 stars 42 forks source link

Feature request: Process data from wave height pressure loggers #784

Closed jebyrnes closed 8 years ago

jebyrnes commented 8 years ago

In talking with @dankelley he noted that there currently are no functions for processing data from water height pressure loggers such as Seabird sensors, or even Onset water level loggers programmed for burst mode into meaningful wave data. Might be very useful for future releases, as there is matlab code, etc., out there for doing just this. N.B. I post this as someone collaborating on the Open Wave Height Logger project - http://lukemiller.org/index.php/category/open-wave-height-logger/ - which simply has a pressure logger that we pre-calibrate at known depths, and it would be quite useful to have an open source R solution to then deriving meaningful oceanographic info about waves. We're working on some scripts ourselves specific to our sensor, but it would be great to have a more general solution for us and others facing similar issues.

dankelley commented 8 years ago

There are lots of things that could be calculated, e.g. the below fakes some data and graphs wave height along with significant-wave-height.

swh <- function(z, cutoff=2/3) {
    zmin <- min(z, na.rm=TRUE) # don't need 2nd arg here, actually
    zmax <- max(z, na.rm=TRUE)
    big <- z > (zmin + cutoff*(zmax-zmin))
    mean(z[z > (zmin + cutoff*(zmax-zmin))], na.rm=TRUE)
}
n <- 500
t <- seq(0, by=0.1, length.out=n)
amplitude <- c(1/2, 1, 1/2, 1/4)
period <- c(1, 2, 5, 10)
z <- rnorm(n)
par(mar=c(3.5, 3.5, 1, 1), mgp=c(2, 0.7, 0))
for (i in 1:length(period)) {
    z <- z + amp[i] * sin(2*pi*t/period[i])
}
plot(t, z, type='l')
grid()
SWH <- swh(z)
abline(h=SWH, col='red')
mtext(sprintf("Significant Wave Height: %.2fm", SWH), side=3, line=0, adj=0)

with graph as below. a

richardsc commented 8 years ago

Wow, the OWHL project looks fantastic. I hadn't heard of it before now, so thanks for pointing it out. I've recently become interested in such things myself, both as someone who wants to better understand pre-calculated wave statistics, and to have the ability to process wave-burst instruments myself (e.g. RBR loggers).

While I think adding wave processing routines to oce would be a great addition, I wonder if it might not be a better idea to put such wave processing routines into a separate package (which could of course depend on, and therefore inherit classes and methods from oce). I think it would be a nice addition to a project like OWHL, to be able to say: "to process the data, just go install R and grab our package". Just a thought.

Do you have links for the Matlab packages you mentioned? It would be nice to know what people are actually using, as well as have a proper set of (literature) references for well defined methods.

jebyrnes commented 8 years ago

Thanks! We'll work on the separate package. Although I wonder if the attenuation coefficient work might be useful for OCe as well (see Gibbons et al. Performance of a New Submersible Tide-Wave Recorder). Seems like a sticky problem for these sorts of things that needs to be solved before the analysis presented above. And it might be useful for CTD casts as well, perhaps?

richardsc commented 8 years ago

I agree that if such wave processing is added to oce the frequency dependent pressure attenuation will either need to included or referenced (I got a kick out of the fact that the paper you mentioned was written by some people in the same building as me!). Dan's rough analysis above is correct as long as you assume that z is the real "height", whether it was inferred from pressure or another means (e.g. acoustic).

To my knowledge though, such attenuation shouldn't have an affect on CTD profiles, in that they are sensing hydrostatic pressure variations, rather than dynamic surface wave fluctuations.

In case it wasn't clear, I'd be happy to be a contributor/co-author/developer on an OWHL package. Please let me know when you've started putting things together, or if you want other input.

jebyrnes commented 8 years ago

Fantastic! Indeed, we're going to talk to some folk who have implemented this before to see if we can get the groundrules of what we need to do. I'm actually pretty new to Fourier analysis, so I'm looking forward to this as a learning challenge. If you have any suggestions as to how we can put this together to make it compatible with OCE, let me know! I'd love to build that in from the get-go!

dankelley commented 8 years ago

As regards compatibility with oce, it's pretty easy. For example, consider the following.

First, create a new oce object

> library(oce)
Loading required package: gsw
Loading required package: testthat
> d<-new("oce")

Now, let's see what's in it (below is from the develop branch as of the time of posting this; previous to an update to that branch, the following printed a less informative error msg)

> d
oce object has nothing in its data slot.

hm, nothing there.

Let's try to see the structure

> str(d)
Formal class 'oce' [package "oce"] with 3 slots
  ..@ metadata     : list()
  ..@ data         : list()
  ..@ processingLog: list()

It seems we might want to put our data in the second slot, so let's try that.

> d@data <- list(p=seq(0, 100, 1))

and now we can try printing the object again

> d
oce object, from file '(filename unknown)', has data as follows.
   p:   0,   1, ...,  99, 100 (length 101)

(This is calling a generic function called show.)

We can use double-bracket to access the data (note that this avoids having to access the slot directly, which is all that much of an advantage here, but helpful in more general code).

> d[['p']]
  [1]   0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34
 [36]  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69
 [71]  70  71  72  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90  91  92  93  94  95  96  97  98  99 100
> 

OK, now let's use double brace to change the data.

> d[['p']][1]<- 10 # nutty
> d
oce object, from file '(filename unknown)', has data as follows.
   p:  10,   1, ...,  99, 100 (length 101)
> d[['p']]
  [1]  10   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35
 [37]  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71
 [73]  72  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90  91  92  93  94  95  96  97  98  99 100

Finally, we'll see what class this has

> class(d)
[1] "oce"
attr(,"package")
[1] "oce"

I hope the above shows that it's quite easy to create data objects that line up with oce.

richardsc commented 8 years ago

Hi @jebyrnes. Have you seen this code:

http://neumeier.perso.ch/matlab/waves.html

It was just brought to my attention, and might be a good reference for an R package. I don't know Urs personally, but both Dan and I were co-authors on a paper with him.

jebyrnes commented 8 years ago

Well this just looks perfect. @millerlp - do we have some test data we could run this through in Octave? And then potentially translate to R! Or... just use RcppOctave! https://cran.r-project.org/web/packages/RcppOctave/vignettes/RcppOctave.pdf

richardsc commented 8 years ago

At the risk of creating more work, I think I would argue that for a new package it might just be better to re-write/adapt the code in R. The advantages to doing so are:

  1. In principle the approach is fairly straightforward, so this isn't re-inventing the wheel
  2. Provides a check that the other code is doing sensible things
  3. Allows insight to be gained in the actual details of the analysis, potentially allowing improvements to be made
  4. Eliminates external dependencies (e.g. octave etc).

I'm actually spinning up on this very issue for my own work, and expect that I'll start doing real coding (and documenting!) in the near future. Since we've talked about having a separate, but oce dependent, package I'll likely start a new public repo where I'll do development. Once I do that we can move the discussion over to there.

@jebyrnes or @millerlp, do you guys have any timeline for this? Also, do you have any sample OWHL files that could be incorporated from the beginning?

millerlp commented 8 years ago

I do have about of month of field data that could be run through it, with the caveat that the pressure response on the prototype was somewhat non-linear with depth, but probably not enough to matter for initial trouble-shooting purposes. I'll let Jarrett speak to the actual timeline, but we've probably got a few months before new prototypes go out and get 'real' data. I may have time in late January to spend on this before the new semester begins.

jebyrnes commented 8 years ago

We actually have a logger in the water right now....which we'll pick up in a month. But I think we can use the current data for this to at least develop the methods. I'm a bit insane with grant proposals until mid-Feb....ugh....but am trying to carve out days here and there for this sort of thing. Perhaps, @millerlp, we can carve out a hackathon day in late Jan? This might be a good pair programming hackbeast.

richardsc commented 8 years ago

That sounds great. I've got lots of data from a (commercial) pressure sensor (RBR solos) which I'm using to start to nail down the details of the processing. One of my goals for the holidays is to get a rough package together, so we could plan to use that as a starting point for whatever I've figured out by then. Any chance of having a "remote" hackathon participant? :smile:

jebyrnes commented 8 years ago

That would be fantastic. We’d be remote as well. Perhaps a meeting on Thurs Jan 21st or Fri Jan 22nd?

millerlp commented 8 years ago

Post-Jan17 works for me (after my last grant deadline), and before Jan 28 would be preferable before the new semester spins up. I'm sure I can carve out a day to play with stuff.

jebyrnes commented 8 years ago

Let’s tentatively say the 21st then. Will email closer to organize things a bit more.

richardsc commented 8 years ago

21st works for me! I'll try and have a bit of a package skeleton before then. @dankelley, do you have any interest in participating?

dankelley commented 8 years ago

Yes, I'd love to participate. I've not done a hackathon before, and am not entirely sure how it would work.

My teaching timetable is not established yet for next year, but it is most likely Tue-Thu early mornings. But I don't think that having all of us working at the same moment is that sensible anyway for coding aspects. I imagine we just need a quorum for discussion on such basic things as

  1. deciding on conventions for function names (I would argue either camelCase or snake_style)
  2. deciding on a class structure (I would argue to inherit from oce)
  3. deciding on the priority of function writing (on this I'd defer to whoever in the group has the most immediate demand)
  4. assigning tasks to people (the basic outline could be decided in real-time and details in a wiki)

A note on timezones: I am at 1 hour east of New York, and Clark is at New York time.

richardsc commented 8 years ago

Hi all -- I kind of forgot about this until just now, and am not sure I'm able to devote an entire day tomorrow. Not sure if @jebyrnes and @millerlp are still planning on doing this, but if so I'll be on email and could be reached more directly through either Google Hangouts (clark.richards@gmail.com) or Skype (forget my username just now).

jebyrnes commented 8 years ago

A few things have come up that delaying would not be the worst thing. Perhaps Monday in two weeks?

millerlp commented 8 years ago

Two weeks should fine for me. In the mean time, I spent some time this evening trying to tackle translating the pressure attenuation correciton functions in pr_corr.m over to R. I didn't quite finish, and it's totally untested, but I think the bones of the function are there. I dumped that code in a repository along with a sample data file from our sensor and a script to help open and format the sample data file. https://github.com/millerlp/testing_OWHL

dankelley commented 8 years ago

@millerlp -- I had a quick look at the code and data. Possibly an extra pair of eyes can help, so maybe I could give my impressions?

  1. It can help to use <- for assignment. R will take the = notation, but it's not recommended. An advantage of using <- is that it looks different, so your mind sort of clicks into R mode. Since this project is in somewhat early days, now would be the natural time to make a decision on this.
  2. Is there any chance that the data header could indicate the timezone? That way, the file would be more self-contained. Another common scheme is to use UTC for all work, of course.
  3. One more thing on the header -- it can help a lot if there is some indication of its size. This can be done with a line containing END_OF_HEADER just before the column names, or something. The point of this is flexibility of any code used to read the data. I am assuming here that a convention of just one line before the line with column names will start to feel limiting eventually. Headers tend to be really quite subject to feature creep. For datasets to be self-contained, the headers tend to accumulate things otherwise put into README files, like lat and lon, investigator name, initial atmospheric pressure (probably ending pressure, too), calibration data, etc. And for humans/code to read such things, it is common to have one line per item. And now we are in a state where parsing the file can be a bit tricky. This is why it's very common to have a special line marking the end of data. Another scheme would be for the header to be written in JSON.
  4. I'd be a little inclined not to use Sys.setenv to set the timezone, but rather to set that by the tz argument in as.POSIXct() or isoDateTime(). By the way, to avoid problems later on, I'd advise using POSIXct style instead of POSIXlt style, because then length() gives the expected results. There are some good things about POSIXlt style but you can always switch into that for a few lines of code if you need to.
  5. That's pretty slick, how you call out to a URL to get the pressure.

I don't know whether the above is of any use. I made the suggestions because I have some experience both in writing R and in translating to R from other languages. That means I've discovered some ways to save time and improve robustness.

It is very exciting to see this code coming together. Oh, and the data look so neat! Below is a code I wrote to do a simple plot, and the plot it made.

## A quick look at the pressure data (Dan Kelley).
data <- read.csv("15032600_halfday.CSV",skip=1,header=TRUE)

# Prefer UTC, e.g. for looking up tides, etc
timeOffset <- 7 * 3600                 # add to Pacific/Daylight to get UTC
timeNumeric <- data$POSIXt+data$frac.seconds/100 + timeOffset
time <- as.POSIXct(timeNumeric, origin="1970-01-01 00:00:00", tz="UTC")

# Prefer pressure in dbar (oceanographic convention)
pressure <- data$Pressure.mbar / 10
temperature <- data$TempC
library(oce)

# I picked the cutoff by using locator() on a full graph of 
# pressure vs time. There are fancy ways to do such things, of 
# course. The oceanographer in me was also interested in the
# descent phase, but I'm skipping that here.
look <- timeNumeric > 1427396528

# Next is a trick to plot on screen interactively, otherwise to PNG
if (!interactive()) png("01dk.png", pointsize=14)
par(mfrow=c(3,1))

# Timeseries
oce.plot.ts(time[look], pressure[look])
oce.plot.ts(time[look], temperature[look])

# Oh, can't resist a spectrum. The `spans` is just made up. Note that
# pressure is in dbar; that will explain differences in spectral level
# for those using mbar.
s <- spectrum(ts(pressure, deltat=0.25), spans=c(51,31,11), log="dB",
              xlab="Frequency [Hz]", ylab="Pressure spectrum [dB]", main="")
# Looks like wind waves at 1.5s and a sweet 10s swell, plus tide;
# the tide could be removed with filtering, since a tidal analysis
# on such a short record is problematic.
abline(v=1/1.5, col='gray')
mtext("wind waves?",  at=1/1.5, side=3, line=0, cex=3/4)
abline(v=1/10, col='gray')
mtext("swell?",  at=1/10, side=3, line=0, cex=3/4)
if (!interactive()) dev.off()

01dk

millerlp commented 8 years ago

@dankelley Appreciate the comments.

  1. I'm lazy as hell, so I default to using = in all my code, while acknowledging the warnings about <- edge cases. Certainly as this goes further I have no problem switching to <- as the standard. 2+3. Regarding the header information, I take your point about wanting to add room for more and flexible fields. I have to re-examine the code for the datalogger to make sure there's sufficient room in flash/SRAM to deal with more text. This gets into the minutiae of the OWHL hardware limitations that are probably beyond the scope of the current discussion, but I will take a look at whether we can add header fields for: timezone, lat, long, and a END_OF_HEADER marker. The goal so far has been to keep the data files easily parsed as .csv files. I'm not sure there's any compelling reason to deviate from that goal, but adding header lines is fairly straightforward. UTC will hopefully be the standard going forward, but that's relying on the user not screwing up the setup of their instrument, which is a huge pitfall with the do-it-yourself, program-it-yourself approach of the OWHL.
  2. My dates are generally always dealt with in POSIXct, so no worries there. Using Sys.setenv() is just my default fallback because I always run into time zone issues, but we can switch to relying on the 'tz' argument.
dankelley commented 8 years ago

I have written a test code on the issues site at https://github.com/dankelley/oce-issues/blob/master/784/784a.R and am pasting it below. You will probably need to update oce to run this, because it uses some new features (like units, original data names, etc.). It makes the graph attached.

A few notes:

  1. I'm inferring time from the two time-related numerical columns.
  2. I change pressure from mbar to dbar, because the latter is more common in oceanography (and is the norm in oce).
  3. I have not put this code into oce because I am not clear that the format of the sample file is finalized.

Any comments? If the data format is fixed, I could put this into oce. Otherwise, I'm inclined to suggest that we close the issue, unless some comments appear here within a week, suggesting that further action would be useful.

Again, thanks for the patience. I didn't mean to ignore this for 8 months. The problem was partly that I was not clear on whether the ball was "in my court".

784a

dankelley commented 8 years ago

An update. I altered the test code and it now makes a 4panel plot. Top left: raw data. Top right: with the descent phase trimmed and a red line used for prewhitening (see code). Lower left: spectrum (of prewhitened data) in conventional form. Lower right: variance-conserving spectrum with log freq axis and (a crude) period axis. I will now stop commenting and wait to see if anyone else weighs in over the next week. 784a

richardsc commented 8 years ago

Yes, this issue got a little stale. As I recall (and after skimming), I think we all decided that it probably wasn't really going to be an oce issue anyway, but rather motivation for starting a new package designed to read and process data from the OWHL (open wave height logger) project.

I don't think we should worry about including any specific code into oce at this point, so probably best to close it. Development of the package and test code was moved to

https://github.com/millerlp/testing_OWHL

some time ago.

dankelley commented 8 years ago

Thanks, Clark. PS. It was fun writing the (crude) code for the energy-conserving log(f) spectrum (bottom-right ... the label is wrong though). Nice 12s waves and also some 17s oldies but goodies.

millerlp commented 8 years ago

Agreed that it's more sensible to make this separate from oce for the time being, since I'll likely be modifying the raw OWHL data spec based on the suggestions here. Eventually I'll try to get a workflow or package together for OWHL that will output data in a format that can be easily used with oce.

dankelley commented 8 years ago

If you need extra eyes on any proposed header format, please feel free to reach out for advice.