dankelley / oce

R package for oceanographic processing
http://dankelley.github.io/oce/
GNU General Public License v3.0
142 stars 42 forks source link

should hann and hamming filters be added? #1262

Closed dankelley closed 7 years ago

dankelley commented 7 years ago

It seems that every time I want to smooth a time series, it's back to the beginning, reading documentation (including some in oce) that seems overly complicated. And then, if the task is a professional one, I'll need to explain how it was done^[1]. I'd prefer, for simple cases, to just say I used a N-point hamming filter, or maybe a hann filter.

And I don't want to have to choose to have NAs at the ends or to have terrible results if there is a trend (here, I'm talking about the two choices for circular in filter()). Heck, I also don't like having to choose whether to use the filter that R supplies, or the same name function that the signal packages provides.

I want to be able to say, simply "the time-series was smoothed with a 5-point Hamming filter" and be done with it.

I'm starting to think that, as a user, I'd like to have a function called hamming(), plain and simple, with two arguments, x and n. I think hamming is superior to hanning, but the latter could be added also. Honestly, I half-forgot that oce has this thing called oceFilter, and I'm not sure what it does that's special, without reading the docs (maybe it handles NA better?)

The other thing is for end points. I think a sensible scheme would be to use whatever filter coefficients are left for both sides. (That is, just start reducing filter length from the sides, dividing through to get the same sum of course.) So, the first and last points of xout would be the same as x, and nearby points would be smoothed a little, and then a little more, until the interior, where the full filter would be used.

I'm thinking out loud here. This would likely get done in C because this filter-trimming is not something I've heard of in the existing R functions. That means that creating it would take more than 10 minutes. It would be a Saturday morning thing.

Does this seem like something useful, to the other developers and users?

[1]: and that e.g. the matlab Butterworth filters don't work the same way as R ones, owing to treating starting conditions differently ... matlab is better when there are trends in the data

dankelley commented 7 years ago

Since I wanted this in my own work, I've added but it is not in the namespace and it is not documented. Heck, you have to look in src/filtering.c to see how to use the thing! Therefore, I feel justified in pushing this to the develop branch. At the moment, it is just copying entries at the ends of the timeseries ... and, actually, I think that makes the most sense because it's easy to describe and therefore won't get users wound up into knots about the endpoints. Below is a sample code, cut/pasted from inside src/filtering.c

system("R CMD SHLIB filtering.c")
dyn.load("filtering.so");

#' Perform hamming filtering
#'
#' NOTE that this is not in the namespace. Also, the behaviour is not in a final
#' form, since this just copies over at the ends of x. (I'm not even sure
#' that is bad, come to think of it.)
#'
#' @param x a vector to be smoothed
#' @param n length of filter (must be an odd integer exceeding 1)
hamming <- function(x, n)
    .Call("hamming", x, n)

x <- seq(-6, 6) + rnorm(13, sd=0.5)
y <- hamming(x, 5)
print(data.frame(x, y))
## next work in my testing, so I'm doing the formula right. But note
## that we need to normalize the filters ... why the heck are they
## not normalized in signal::hamming() or on wikipedia?
## signal::hamming(3) ## [1] 0.08 1.00 0.08
## signal::hamming(5) ## [1] 0.08 0.54 1.00 0.54 0.08

par(mar=c(3, 3, 1, 1), mgp=c(2, 0.7, 0))
plot(seq_along(x), x, type='b')
points(seq_along(x), y, col=2)
lines(seq_along(x), y, col=2)
grid()
dankelley commented 7 years ago

I decided (a) not to do it in C because using the existing R stats::filter code is probably 20+ years old and so has had a lot of testing and (b) doing it in R means that users can add their own smoothing kernels. I am calling this lowpass because that name is not used yet, and I think it's the verb that people use in papers.

I really doubt that I will ever use the existing oce.filter, but it's been in oce for a long time its documentation states its limitations, so I think it should stay.

Closing this ... of course, other developers could reopen it, or comment here.

dankelley commented 7 years ago

Why I wanted this -- it's so easy to say "21-point Hamming filter was applied"

screen shot 2017-06-20 at 2 42 45 pm
richardsc commented 7 years ago

Very nice. I think this is a good addition. I have a function in my personal "misc" package (crMisc on GH) called movingAverage(). It really is just a convenience function for when I want to do some simple smoothing (usually for plotting purposes). I like this better.

richardsc commented 7 years ago

Related to my above comment, I just went into the source for lowpass() and added a filter="boxcar" method. Maybe you can take a look?

dankelley commented 7 years ago

Your new boxcar code looks good, and I pushed to gh (develop branch). Thanks!!