add renamer() - Githubissues

dankelley commented 3 weeks ago

This relates to several other issues in which I've been sort of thinking out loud about how to address variable renaming (see e.g. #2238, #2239). It's confusing having so much discussion in so many issues, particularly because quite a lot of time has passed since the ideas first came to the fore. Therefore I am starting this new issue, and will be closing the others. (The links in sentence 1 will still work for the others, of course).

In a new branch called "rename", I am adding a new function called rename(). I do not intend to merge this stuff into the "develop" until it "feels right" to me. The general plan is as follows. Note that GH top-level views of issues only shows checklist completion if the list is in the first comment, so I will be revisiting the present issue over time, adding and subtracting content.

I will click items as I do them, and only push to GH when I have checked results.

Note that I am adding to the test suite as I do this work, to prevent problems of breaking thing N when working on thing N+1.

[x] rename() needs to rename the variables stored in the @data slot
[x] rename() needs to rename @metadata$dataNamesOriginal
[x] rename() needs to rename @metadata$units (and insert units, if not in the original object but if specified in a dictionary)
[x] rename() needs to rename @metadata$flags
[x] rename() ought to accept built-in dictionary files
[x] rename() ought to accept user-supplied dictionary files
[x] rename() ought to accept user-supplied dictionaries in the form of lists

dankelley commented 3 weeks ago

Commit 6945773ad232e1042e794abb15f81b71c3dcbb3a of the "rename" branch seems to work reasonably well, on built-in tests and on the example. For the latter, which includes a CIOOS case, run the help for ?rename. (This is not in the online docs because they are attached only to the develop branch.)

dankelley commented 3 weeks ago

Below shows the "example(rename)" output. (I am putting this here because the online docs only get rebuilt when I push to either "main" or "develop", not to "rename", which is the branch in which I'm doing this work.

rename.md

dankelley commented 2 weeks ago

Today, I'm translating the approx 300-line conditional block in R/ctd.sbe.R into the new dictionary form (where it will become approx 100 lines). I am testing this in the oce-development repo. I want to be sure that the new scheme works, before I consider switching that block out for the dictionary method. (I certainly do not want to maintain both schemes.)

So far, with maybe a dozen lookups, things seem ok. But there are a LOT of regular expressions to get right.

And there is one nasty thing that will block me for a while: I have to get a regexp for that accented-e problem in some SBE files (including one in the test suite). In the present code, this is handled by not trying to match that character, but instead going past the name to the description of the name. But that's just because I got frustrated, trying to figure out a regexp. Surely there is a way. I am fiddling with this in isolation -- nowhere near the oce code.

dankelley commented 2 weeks ago

For more on the accented e problem, see https://github.com/dankelley/oce/issues/1977 (which is pretty long and I still don't really understand all the nuances).

dankelley commented 2 weeks ago

I'm going to go with what you see below. I don't think this will match any other sigma-ish things, and it prevents the problems of #1977, i.e.

we cannot have non-ASCII characters in code
we are told not to use useBytes=TRUE
we need this to work on windows machines, with a wide range of encodings (and I think Canadians don't use the European encodings on the R test machines)
the underlying R seems to have changed over time on these matters

And, with this, we can use the dictionary-style to rename things, without having to look in more detail at other things that appear in a * name = line in a CNV file.

PS. none of the test code is pushed to GH. I won't likely push until Sunday afternoon, and that will only be in the "rename" branch. I do want to get this working so I can forget about it during a busy time for classes, but I have no intention of changing read.ctd.sbe() until I am really sure I like this. And, even when I do, there will be a scheme with a new argument that will use the old code unless you set that new argument. I won't get into details here.

$ grep sigmaTheta ~/git/oce/inst/extdata/dictionary_sbe.csv
sigma-[^0:9]00,sigmaTheta,kg/m^3,