Closed dankelley closed 6 years ago
My reading pawlowicz2002cthw.pdf (in the repo for issue #1351 at https://github.com/dankelley/oce-issues/tree/master/13xx/1351) tells me that the inference method used in T-TIDE is quite simple. It's exemplified by the fact that the code given below, which constitutes https://github.com/dankelley/oce-issues/blob/master/13xx/1351/1351a.R at the moment of making this comment) runs without messages. (I prefer to use expect_equal
to demonstrate what's going on, because that's less confusing than sentences.)
library(testthat)
ttide <- read.table("ttide.dat", header=TRUE, skip=8, stringsAsFactors=FALSE)
## t_demo uses
## infername=['P1';'K2'];
## inferfrom=['K1';'S2'];
## infamp=[.33093;.27215];
## infphase=[-7.07;-22.40]
## Check amplitudes (using the infamp field)
expect_equal(ttide[ttide$Tide=="P1", "Amp"], .33093*ttide[ttide$Tide=="K1", "Amp"], tol=1e-5) # amp P1 from K1
expect_equal(ttide[ttide$Tide=="K2", "Amp"], .27215*ttide[ttide$Tide=="S2", "Amp"], tol=1e-5) # amp P1 from K1
## Check phases (using the infphase field)
expect_equal(ttide[ttide$Tide=="P1", "Pha"], ttide[ttide$Tide=="K1", "Pha"] - (-7.07), tol=1e-5) # amp P1 from K1
expect_equal(ttide[ttide$Tide=="K2", "Pha"], ttide[ttide$Tide=="S2", "Pha"] - (-22.40), tol=1e-5) # amp P1 from K1
A good syntax seems reasonable to me, because it is clear and also because it is very reminiscent of the T-TIDE setup. I am proposing to specify the things in a list, which should be clearer to users and is more R-like.
tidem(sealevel,
infer=list(name=c("P1", "K2"),
from=c("K1", "S2"),
amp=c(0.33093, 0.27215),
phase=c(-7.07,-22.4))
I thought a bit about trying to use better names (e.g. amp
does not tell us that it is a multiplier as opposed to a value to be added) but nothing jumps out at me, and there is merit in making this very similar to T-TIDE.
I've coded the docs (not pushed yet) and the new part looks as below. This look ok to me, so I'll likely code the actual action sometime today.
I've made some progress on this, in the "dk" branch (at commit 850110d1c3fce9e23f028bc68d7465cdf3eef215 as I write this comment). I've settled on the syntax as in the just-previous comment, and the test suite at https://github.com/dankelley/oce-issues/tree/master/13xx/1351 has several tests.
This test directory is important, and not just for the tests. It also provides test data (that are likely to make their way into the official test suite) and screenshots showing where I got those data.
Before I say more, please note that I am so far only looking at amplitude; phase can come later. (I know that I define phase differently from Foreman, but the docs explain the definition so I think we are ok. Plus, of course, the fitting works great, which it could not do if the phase were wrong. Possibly I'll make a new issue on defining phase in the Foreman way, via a new argument, most likely.)
As for the tests in that directory, the most telling is 1351c.R
. This tests amplitudes between Foreman, T_TIDE, and tidem(). The output graph (pasted below) gives a lot of information, and deserves careful study. Here are some notes
T_TIDE matches best to what Foreman calls "A", not to his "AL". I don't exactly know what is the difference between the two, but until I do some more reading, I will assume that Rich knew what he was doing, in publishing a T_TIDE that matches to "A".
Three components seem to disagree between Foreman and T_TIDE: "K1", "S2", and "K2". The printout at the end of 1351c.R
gives as follows, and I have checked in the screenshots to verify that this was not a cut/paste error in my PDF viewer. I have no idea why these differences exist, but two of them are by 1 in the final digit, and the other is by 2 in that digit, so I assume it results from a difference in rounding or truncation for the representation. Therefore, I am going to say that, basically, Foreman's "A" is the same as T_TIDE's amplitude, and I'll consider T_TIDE as a reference (since possibly its computer had higher precision, or accuracy, the publication being in year 2002 versus 1977.)
> ## Three points differ between T_TIDE and Foreman by more than 0.1mm or more:
> for (check in c("K1", "S2", "K2")) {
+ print(ttide[ttide$name==check, c("name", "frequency", "amplitude")])
+ print(foreman[foreman$name==check, c("name", "frequency", "A")])
+ }
name frequency amplitude
10 K1 0.04178 0.1405
name frequency A
10 K1 0.04178075 0.1406
name frequency amplitude
19 S2 0.08333 0.2197
name frequency A
19 S2 0.08333334 0.2195
name frequency amplitude
20 K2 0.08356 0.0598
name frequency A
20 K2 0.08356149 0.0597
tidem
mostly agrees with T_TIDE very well (median diff of 0.03mm, compared with T_TIDE stated values of 0.1mm), except for three points: K1, P1, and S2, which are in error by 6mm, 2mm and 1mm. I can discount the last of these completely as a rounding error, so we are left with a mystery for K1 and P1. Note that, in this scheme, P1 is inferred from K1, so that may be a clue.1351c.R
now ("dk" commit a9a6b88687031a570d63172046673d5cad243b4c) also prints out a comparison of the three points of measurable disagreement (K1, P1 and S2) between tidem
and the reference values. It is worth nothing that these are implicated in the inference. Results are below. I think they are big enough for some further investigation, although admittedly worrying about 6mm is evidence more of a desire to get the algorithm "correct" than it is evidence of a concern for real-world applications. (Put another way, why even permit this inference scheme, if we are going to get it wrong?)
Hypothesis. Possibly Foreman and T_TIDE are fitting for the to-be-inferred constituent, and then replacing it. This is not what tidem
is doing at the moment; instead, tidem
creates the to-be-inferred constituent after the fit is done, and, in this case, the Rayleigh criterion has prevented those constituents from being in the original regression. I can see how this could cause (no pun) a ripple effect. I'll look in the T_TIDE code, possibly sometime before the new year.
Three points differ between T_TIDE (first in pairs below) and tidem (second).
> df <- data.frame(name=m[["name"]], frequency=m[["frequency"]], amplitude=m[["amplitude"]])
> for (check in c("K1", "P1", "S2")) {
+ print(ttide[ttide$name==check, c("name", "frequency", "amplitude")])
+ print(df[which(m[["name"]]==check),])
+ }
name frequency amplitude
10 K1 0.04178 0.1405
name frequency amplitude
10 K1 0.04178075 0.1347417
name frequency amplitude
9 P1 0.04155 0.0465
name frequency amplitude
9 P1 0.04155259 0.04459005
name frequency amplitude
19 S2 0.08333 0.2197
name frequency amplitude
19 S2 0.08333333 0.2202366
I'm marking @richardsc in case he wants to comment on my matlab-fu.
I see. In t_tide_v1/t_tide.m
(now in the https://github.com/dankelley/oce-issues/tree/master/13xx/1351 repo):
jref
by calling constituents()
.constituents()
PS (on my matlab-fu): Good gawd in heav'n, what kind of a language ends a function by either the end-of-file or the start of another function? I can see how that came about, historically, when each function was a file, but it is ugly. And, as far as I can tell, it must mean that a function cannot define another (local) function within itself, which points to a serious weakness of Matlab or (admittedly) my understanding of it.
Hm. I tried including the infer$name
constituents in the fit, and the results got quite a lot worse: the mean abs diff jumped from 0.25mm to 3.16mm. (Recall that the amplitudes in the Foreman and T_TIDE files are given to 0.1mm, so 3.16mm is 30X larger than I'd expect if my calculation were the same.)
(This test is in tidem.R
but now embraced in a false block controlled by a variable named TESTinfer1
, but I will not likely keep this after I figure things out.)
Also, item 3 of my previous comment said something about line 827 of the t_tide.m
file that I now think is not true. Clearly I cannot look at matlab code blocks in isolation anymore! (I can't believe that I used to program in that thing.)
So I'm still at a loss as to why the inference is not going well. My next step will be to read Foreman's 1977 text on this, since T_TIDE is just a matlab rewriting of it.
The relevant formulae are developed in some (multi-page) detail in Sec 2.3.4 of Foreman (1977 revised Oct 2004; a copy of the PDF named heights2004.pdf being inserted in the oce-issues repo at https://github.com/dankelley/oce-issues/blob/master/13xx/1351/heights2004.pdf
See also http://www.dfo-mpo.gc.ca/science/data-donnees/tidal-marees/index-eng.html but note that this site, despite being only a year old, has broken links so the software cannot be downloaded (I have a copy from 2012, though, in my semi-private stash).
This is working now in the "dk" branch, commit 98fa0f7191e242c5b2aa399271135ceb6b233a25 but I will leave this issue open until I merge that branch into "develop", likely within 3 days.
Closing now, since this works in the "dk" branch, and that branch has been merged into "develop", with commit note as follows.
commit 7f39716f1a0f3c45d1569402e1fd7ad61e0182e4 Merge: e249296 98fa0f7 Author: dankelley kelley.dan@gmail.com Date: Tue Jan 2 08:20:21 2018 -0400
Merge branch 'dk' into develop
As a test, I ran the following with oce before and after this merge, and the
results were identical (except for the timestamp in the summary).
library(oce)
data("sealevel")
m <- tidem(sealevel, constituent=c("standard", "M10"))
summary(m)
d <- as.data.frame(m@data[c("const", "name", "freq", "amplitude", "phase", "p")])
print(d)
This came up as I was gathering test data for #1351. What
tidem
does at the moment for added constituents is simply to force a fit to the associated frequency. However, Foreman 1977 (and, I think, t-tide) do something else -- inferring a stated frequency from another stated frequency.I see that there are three elements to each addition: (1) the constituent name, (2) an amplitude and (3) a phase. That means that
tidem()
will need to get some new arguments, or perhaps some additional nuances to existing arguments. For example, in thet_demo.m
code we seewhich means to infer
P1
based onK1
, etc.Presently, in
tidem()
, we have a syntax likeand I suppose this could be extended as (option 1)
where the
NA
means that we are not askingtidem()
to do anything nonstandard with the "standard" constituents.Another possible syntax might be something along the lines of (option 2)
or some other syntax that more closely binds things together.
As I started typing in this comment, I was thinking I'd like something along the lines of option 2, but I think now that option 1 is clearer to the user and a lot easier to code (and detect errors in).
If other developers are looking at these things over the break, they might want to vote for option 1 or 2, but I'll have option 1 in my mind, as I now return to the task of finding out what Foreman was actually doing with this inference of one constituent from another.