dankelley / oce

R package for oceanographic processing
http://dankelley.github.io/oce/
GNU General Public License v3.0
143 stars 42 forks source link

read.odf() missing 5 unit types #1023

Closed dankelley closed 8 years ago

dankelley commented 8 years ago
Warning messages:
1: In ODFNames2oceNames(ODFnames, ODFunits, PARAMETER_HEADER = NULL,  :
  unable to interpret ODFunits[1]='degrees C'

2: In ODFNames2oceNames(ODFnames, ODFunits, PARAMETER_HEADER = NULL,  :
  unable to interpret ODFunits[2]='decibars'

3: In ODFNames2oceNames(ODFnames, ODFunits, PARAMETER_HEADER = NULL,  :
  unable to interpret ODFunits[3]='mmho/cm'

4: In ODFNames2oceNames(ODFnames, ODFunits, PARAMETER_HEADER = NULL,  :
  unable to interpret ODFunits[4]='degrees'

5: In ODFNames2oceNames(ODFnames, ODFunits, PARAMETER_HEADER = NULL,  :
  unable to interpret ODFunits[5]='m/s'
dankelley commented 8 years ago

I'm shoving this in here too ... the filenames are wrong also in this file. (I know this should be a separate issue, but for this particular file everything seems broken so I'll fix it all at the same time.)

> names(o@data)
[1] "  CODE=temperature," "  CODE=pressure,"    "  CODE=HCDM,"        "  CODE=HCDT,"        "  CODE=HCSP,"        "  CODE=time,"       
dankelley commented 8 years ago

The units are fixed in develop commit 6e587b37541d8d8f472842dd4ae90e256b30721e

Warning. one of the units is crazy. It's for an angle in a current meter, and the unit is given as mmho/cm, so clearly there can be errors in ODF files.

dankelley commented 8 years ago

I have the names working now, in develop commit b3a0025a4e6310692e87619f8e28e4a89d4d6b13 but I have a question for @richardsc ...

Clark, do you think I should fix up units that are clearly erroneous in ODF files? I am doing that right now, in the above-named commit. It spits out the warning message

Warning message:
In ODFNames2oceNames(ODFnames, ODFunits, PARAMETER_HEADER = NULL,  :
  odd unit (mmho/cm) for data column named 'directionMagnetic'; setting to degrees

I feel this is the right thing to do, but it does violate an oce convention of obeying what's in the data files. My argument for fixing up the unit is that there must clearly be a user-entry aspect that can get messed up in ODF files, and there seems to be nothing to be gained by retaining something that is very clearly an error. (The code, near line 379 of R/odf.R, is very isolated to this one error, so I cannot imagine any bleed-through.)

Oh, one more question for @richardsc -- do you like the names I'm using? This is a private file for an Aanderaa CM, and I am using names as follows. My question is whether maybe I should call the true direction as just direction. I could call the speed as speed but this is for a 2D instrument and I would prefer to use speed for a full (3D) scalar speed.

     temperature [°C, ITS-90] 3.522               3.6937              3.802               8831 TEMP_01     
     pressure [dbar]          1163.8              1163.8              1163.8              8831 PRES_01     
     directionMagnetic [°]    0                   73.392              359.69              8831 HCDM_01     
     directionTrue [°]        0.172               56.938              359.47              8831 HCDT_01     
     speedHorizontal [m/s]    0.006               0.19572             0.443               8831 HCSP_01     
     time                     2013-07-03 22:00:00 2014-01-03 20:00:00 2014-07-06 20:00:00 8831 SYTM_01     

PS to Clark -- I can email you the file if you want to look. But it's definitely private for a while.

richardsc commented 8 years ago

@dankelley wrote: Clark, do you think I should fix up units that are clearly erroneous in ODF files?

It's funny, because when I saw your comment about the units being whacky I almost wrote something that said: "Well, our job is to read the data file, not fix other people's errors".

While in general I think that should be true, obviously it's our package so we can do what we want. One question would be if this is a unit error that could occur frequently. In other words, is it better to add a few lines to oce for something that may never occur again, or to add a few lines to the processing script to deal with the issue after oce reads the file.

Probably the point is a bit moot now, because I suspect you've already made the change, and I think you should just leave it. In general, I think I would argue for the latter. Erroneous files are a pain, but I don't think we want to start trying to correct errors that occur during processing. For example, what if the error was not the unit, but the channel name? Then we would correct the unit to match the name, but that would put us in a "two wrongs make the data really wrong" situation.

As for the names, I think those are fine.

I also don't think that we've ever had a proper convention for specifying "magnetic" or "true" headings/directions. I don't mind the *Magnetic or *True convention, but another proposal could be that it gets included in the scale field of the units, e.g.:

d@metadata
  .. ..$ units       :List of 7
  .. .. ..$ heading     :List of 2
  .. .. .. ..$ unit :  expression(degree)
  .. .. .. ..$ scale: chr "magnetic"

Of course, then the issue is how to name the fields in the @data slot.

richardsc commented 8 years ago

Oh, wait ... I just realized that of course instrument heading and current direction are not really the same thing. Well, I guess they sort of are for an Aanderraa CM, but not for an ADCP or Aquadopp. I'll edit the previous comment to remove that part ...

dankelley commented 8 years ago

I am a bit scared to call both direction and let the unit scale indicate which is which, because the penalty for using the wrong direction is huge in calculating (u,v). (The problem is that the user would have to check the scale, which is tricky programmatically for novices ... we cannot rely on the True angle being first in the file, or anything.)

So I'd argue for directionMagnetic and directionTrue. Actually, I like having the True there because on the phone if a person says "direction" the other person always has to interrupt and ask which one.

I am now thinking it's best to just warn on that funky unit. It never gets used in calculations, anyhow.

PS. I will make sure that as.cm() works ok on these data ... probably if that takes more then 3 lines of code I'll make an issue.

dankelley commented 8 years ago

Whoa, there is no as.cm(). I'll add an issue and fix tonight or tomorrow.

dankelley commented 8 years ago

Hm. heading is used in data(cm) and that data type is several years old, so we may be stuck with that. I don't think it makes sense for Interocean S4 data, though! I looked at the file used for data(cm) and it is an ascii file that uses the word Dir for this variable.

The word heading makes no sense for an S4 (in my opinion) since I really think the word suggests something (a ship, a fared device, etc) that 'heads' into the wind. And the S4 is a sphere, which doesn't do anything like that!

We don't want a lot of inconsistency. The word used for these ODF current-meter data is certainly "direction", and that would argue to save it as e.g. directionTrue in the object ... but then as.cm() would switch it to heading, which seems a bit confusing.

So it seems to be a choice between a bit of confusion, and a change to years-old code. I am leaning to changing that years-old code, because I cannot recall a single user (other than Clark and me) who has used Interocean S4 data, and both Clark and I can easily tweak any of our code, if we have to, for a simple name change. (Note: the plot method can handle things internally.)

What do you think, Clark ( @richardsc ) ? Alter the existing name in cm objects (my choice) or just live with what seems the unphysical variable name "heading"?

richardsc commented 8 years ago

I agree with you. Heading indicates an instrument direction and not a current direction, so the s4 code should be changed. Aanderaa current meters and s4 current meters should both be consistently stored in cm objects.

This also suggests that the s4 code should distinguish between magnetic and true (if possible).

dankelley commented 8 years ago

Good. I will recode for the new name directionTrue tomorrow sometime. I basically have as.cm() working but am not going to commit until a new day when my eyes are keen enough to see what I'm doing. As always, thanks!

dankelley commented 8 years ago

Actually, as I look at the code I've just written, I think it should be called direction. We don't say e.g. temperatureCalibrated or similar. I don't think we need the postfix adjective. I think the magnetic should get a special name (directionMagnetic), however, because if we have two direction variables it may cause confusion. So I'll code that tomorrow morning, unless an objection appears here tonight.

dankelley commented 8 years ago

I have this working now in develop commit 8d332972b7912762ecd5cffa02ef16b2b07fafd7

I decided to call the direction variable as direction, for reasons mentioned in the just-previous comment. This is new, so it could change if someone argues for that.

Closing.

dankelley commented 8 years ago

Postscript. This confusion came about because of inconsistencies in ODF files. One file had e.g.

  CODE= 'PSAL_01',

but another had e.g.

  CODE=TEMP_01,

and that difference (spaces and quote marks) was messing up the decoder. I do not have a lot of confidence that the updated code is going to work for a broad suite of files but I prefer to keep this closed and open new (finer-grained) issues as and if they come up.