dankelley / oce

R package for oceanographic processing
http://dankelley.github.io/oce/
GNU General Public License v3.0
142 stars 42 forks source link

tidem() should handle additional constituents as Foreman 1977 and t-tide do #1352

Closed dankelley closed 6 years ago

dankelley commented 6 years ago

This came up as I was gathering test data for #1351. What tidem does at the moment for added constituents is simply to force a fit to the associated frequency. However, Foreman 1977 (and, I think, t-tide) do something else -- inferring a stated frequency from another stated frequency.

I see that there are three elements to each addition: (1) the constituent name, (2) an amplitude and (3) a phase. That means that tidem() will need to get some new arguments, or perhaps some additional nuances to existing arguments. For example, in the t_demo.m code we see

infername=['P1';'K2'];
inferfrom=['K1';'S2'];
infamp=[.33093;.27215];
infphase=[-7.07;-22.40];

which means to infer P1 based on K1, etc.

Presently, in tidem(), we have a syntax like

tidem(..., constituents=c("standard", "P1", "K2"))

and I suppose this could be extended as (option 1)

tidem(..., constituents=c("standard", "P1", "K2"),
      from=c(NA, "K1", "S2"),
      amplitude=c(NA,0.33093,0.27215),
      phase=c(NA,-7.07,-22.40))

where the NA means that we are not asking tidem() to do anything nonstandard with the "standard" constituents.

Another possible syntax might be something along the lines of (option 2)

tidem(..., constituents=c("standard", "P1 from K1 amplitude 0.33093 phase -7.07",
                                      "K2 from K2 amplitude 0.27215 phase -22.40"))

or some other syntax that more closely binds things together.

As I started typing in this comment, I was thinking I'd like something along the lines of option 2, but I think now that option 1 is clearer to the user and a lot easier to code (and detect errors in).

If other developers are looking at these things over the break, they might want to vote for option 1 or 2, but I'll have option 1 in my mind, as I now return to the task of finding out what Foreman was actually doing with this inference of one constituent from another.

dankelley commented 6 years ago

My reading pawlowicz2002cthw.pdf (in the repo for issue #1351 at https://github.com/dankelley/oce-issues/tree/master/13xx/1351) tells me that the inference method used in T-TIDE is quite simple. It's exemplified by the fact that the code given below, which constitutes https://github.com/dankelley/oce-issues/blob/master/13xx/1351/1351a.R at the moment of making this comment) runs without messages. (I prefer to use expect_equal to demonstrate what's going on, because that's less confusing than sentences.)

library(testthat)

ttide <- read.table("ttide.dat", header=TRUE, skip=8, stringsAsFactors=FALSE)
## t_demo uses
## infername=['P1';'K2'];
## inferfrom=['K1';'S2'];
## infamp=[.33093;.27215];
## infphase=[-7.07;-22.40]

## Check amplitudes (using the infamp field)
expect_equal(ttide[ttide$Tide=="P1", "Amp"], .33093*ttide[ttide$Tide=="K1", "Amp"], tol=1e-5) # amp P1 from K1
expect_equal(ttide[ttide$Tide=="K2", "Amp"], .27215*ttide[ttide$Tide=="S2", "Amp"], tol=1e-5) # amp P1 from K1
## Check phases (using the infphase field)
expect_equal(ttide[ttide$Tide=="P1", "Pha"], ttide[ttide$Tide=="K1", "Pha"] - (-7.07), tol=1e-5) # amp P1 from K1
expect_equal(ttide[ttide$Tide=="K2", "Pha"], ttide[ttide$Tide=="S2", "Pha"] - (-22.40), tol=1e-5) # amp P1 from K1
dankelley commented 6 years ago

A good syntax seems reasonable to me, because it is clear and also because it is very reminiscent of the T-TIDE setup. I am proposing to specify the things in a list, which should be clearer to users and is more R-like.

tidem(sealevel,
      infer=list(name=c("P1", "K2"),
                 from=c("K1", "S2"),
                 amp=c(0.33093, 0.27215),
                 phase=c(-7.07,-22.4))

I thought a bit about trying to use better names (e.g. amp does not tell us that it is a multiplier as opposed to a value to be added) but nothing jumps out at me, and there is merit in making this very similar to T-TIDE.

dankelley commented 6 years ago

I've coded the docs (not pushed yet) and the new part looks as below. This look ok to me, so I'll likely code the actual action sometime today.

screen shot 2017-12-29 at 9 56 57 am
dankelley commented 6 years ago

I've made some progress on this, in the "dk" branch (at commit 850110d1c3fce9e23f028bc68d7465cdf3eef215 as I write this comment). I've settled on the syntax as in the just-previous comment, and the test suite at https://github.com/dankelley/oce-issues/tree/master/13xx/1351 has several tests.

This test directory is important, and not just for the tests. It also provides test data (that are likely to make their way into the official test suite) and screenshots showing where I got those data.

Before I say more, please note that I am so far only looking at amplitude; phase can come later. (I know that I define phase differently from Foreman, but the docs explain the definition so I think we are ok. Plus, of course, the fitting works great, which it could not do if the phase were wrong. Possibly I'll make a new issue on defining phase in the Foreman way, via a new argument, most likely.)

As for the tests in that directory, the most telling is 1351c.R. This tests amplitudes between Foreman, T_TIDE, and tidem(). The output graph (pasted below) gives a lot of information, and deserves careful study. Here are some notes

  1. T_TIDE matches best to what Foreman calls "A", not to his "AL". I don't exactly know what is the difference between the two, but until I do some more reading, I will assume that Rich knew what he was doing, in publishing a T_TIDE that matches to "A".

  2. Three components seem to disagree between Foreman and T_TIDE: "K1", "S2", and "K2". The printout at the end of 1351c.R gives as follows, and I have checked in the screenshots to verify that this was not a cut/paste error in my PDF viewer. I have no idea why these differences exist, but two of them are by 1 in the final digit, and the other is by 2 in that digit, so I assume it results from a difference in rounding or truncation for the representation. Therefore, I am going to say that, basically, Foreman's "A" is the same as T_TIDE's amplitude, and I'll consider T_TIDE as a reference (since possibly its computer had higher precision, or accuracy, the publication being in year 2002 versus 1977.)

   > ## Three points differ between T_TIDE and Foreman by more than 0.1mm or more:
   > for (check in c("K1", "S2", "K2")) {
   +     print(ttide[ttide$name==check, c("name", "frequency", "amplitude")])
   +     print(foreman[foreman$name==check, c("name", "frequency", "A")])
   + }
      name frequency amplitude
   10   K1   0.04178    0.1405
      name  frequency      A
   10   K1 0.04178075 0.1406
      name frequency amplitude
   19   S2   0.08333    0.2197
      name  frequency      A
   19   S2 0.08333334 0.2195
      name frequency amplitude
   20   K2   0.08356    0.0598
      name  frequency      A
   20   K2 0.08356149 0.0597
  1. tidem mostly agrees with T_TIDE very well (median diff of 0.03mm, compared with T_TIDE stated values of 0.1mm), except for three points: K1, P1, and S2, which are in error by 6mm, 2mm and 1mm. I can discount the last of these completely as a rounding error, so we are left with a mystery for K1 and P1. Note that, in this scheme, P1 is inferred from K1, so that may be a clue.

1351c

dankelley commented 6 years ago

1351c.R now ("dk" commit a9a6b88687031a570d63172046673d5cad243b4c) also prints out a comparison of the three points of measurable disagreement (K1, P1 and S2) between tidem and the reference values. It is worth nothing that these are implicated in the inference. Results are below. I think they are big enough for some further investigation, although admittedly worrying about 6mm is evidence more of a desire to get the algorithm "correct" than it is evidence of a concern for real-world applications. (Put another way, why even permit this inference scheme, if we are going to get it wrong?)

Hypothesis. Possibly Foreman and T_TIDE are fitting for the to-be-inferred constituent, and then replacing it. This is not what tidem is doing at the moment; instead, tidem creates the to-be-inferred constituent after the fit is done, and, in this case, the Rayleigh criterion has prevented those constituents from being in the original regression. I can see how this could cause (no pun) a ripple effect. I'll look in the T_TIDE code, possibly sometime before the new year.

Three points differ between T_TIDE (first in pairs below) and tidem (second).
> df <- data.frame(name=m[["name"]], frequency=m[["frequency"]], amplitude=m[["amplitude"]])
> for (check in c("K1", "P1", "S2")) {
+     print(ttide[ttide$name==check, c("name", "frequency", "amplitude")])
+     print(df[which(m[["name"]]==check),])
+ }
   name frequency amplitude
10   K1   0.04178    0.1405
   name  frequency amplitude
10   K1 0.04178075 0.1347417
  name frequency amplitude
9   P1   0.04155    0.0465
  name  frequency  amplitude
9   P1 0.04155259 0.04459005
   name frequency amplitude
19   S2   0.08333    0.2197
   name  frequency amplitude
19   S2 0.08333333 0.2202366
dankelley commented 6 years ago

I'm marking @richardsc in case he wants to comment on my matlab-fu.

I see. In t_tide_v1/t_tide.m (now in the https://github.com/dankelley/oce-issues/tree/master/13xx/1351 repo):

  1. line 307 gets to-be-inferred constituents indexed as jref by calling constituents().
  2. line 735 is the start of the definition of constituents()
  3. line 827 adds the to-be-inferred constituents regardless (as far as I can tell) whether the Rayleigh criterion is met.

PS (on my matlab-fu): Good gawd in heav'n, what kind of a language ends a function by either the end-of-file or the start of another function? I can see how that came about, historically, when each function was a file, but it is ugly. And, as far as I can tell, it must mean that a function cannot define another (local) function within itself, which points to a serious weakness of Matlab or (admittedly) my understanding of it.

dankelley commented 6 years ago

Hm. I tried including the infer$name constituents in the fit, and the results got quite a lot worse: the mean abs diff jumped from 0.25mm to 3.16mm. (Recall that the amplitudes in the Foreman and T_TIDE files are given to 0.1mm, so 3.16mm is 30X larger than I'd expect if my calculation were the same.)

(This test is in tidem.R but now embraced in a false block controlled by a variable named TESTinfer1, but I will not likely keep this after I figure things out.)

Also, item 3 of my previous comment said something about line 827 of the t_tide.m file that I now think is not true. Clearly I cannot look at matlab code blocks in isolation anymore! (I can't believe that I used to program in that thing.)

So I'm still at a loss as to why the inference is not going well. My next step will be to read Foreman's 1977 text on this, since T_TIDE is just a matlab rewriting of it.

dankelley commented 6 years ago

The relevant formulae are developed in some (multi-page) detail in Sec 2.3.4 of Foreman (1977 revised Oct 2004; a copy of the PDF named heights2004.pdf being inserted in the oce-issues repo at https://github.com/dankelley/oce-issues/blob/master/13xx/1351/heights2004.pdf

See also http://www.dfo-mpo.gc.ca/science/data-donnees/tidal-marees/index-eng.html but note that this site, despite being only a year old, has broken links so the software cannot be downloaded (I have a copy from 2012, though, in my semi-private stash).

dankelley commented 6 years ago

This is working now in the "dk" branch, commit 98fa0f7191e242c5b2aa399271135ceb6b233a25 but I will leave this issue open until I merge that branch into "develop", likely within 3 days.

dankelley commented 6 years ago

Closing now, since this works in the "dk" branch, and that branch has been merged into "develop", with commit note as follows.

commit 7f39716f1a0f3c45d1569402e1fd7ad61e0182e4 Merge: e249296 98fa0f7 Author: dankelley kelley.dan@gmail.com Date: Tue Jan 2 08:20:21 2018 -0400

Merge branch 'dk' into develop

As a test, I ran the following with oce before and after this merge, and the
results were identical (except for the timestamp in the summary).

    library(oce)
    data("sealevel")
    m <- tidem(sealevel, constituent=c("standard", "M10"))
    summary(m)
    d <- as.data.frame(m@data[c("const", "name", "freq", "amplitude", "phase", "p")])
    print(d)