control of how data is reduced on das2server and in clients

jbfaden commented 4 years ago

We've always intended to put in controls for how das2streams should be reduced, selecting between linear and log averages, for example. As data is produced by the server, it will be piped through data reduction filters which have generally done linear averages, which are then sent over to the das2 client and displayed in log-Z space. This results in odd artifacts and noisy results when there are just a few measurements in each bin.

The server can at least identify a particular reducer to select the correct method. For example, we could have a reducer which does geometric averages (averaging in log) on the server side, but there is no way for the das2 client (often Autoplot) to know that if the data is to be further reduced for display, that geometric averages should be used.

Further, there are other spaces which might be identified, for example "mod360" which means the average of 259 and 1 is 0. Also one might wish to indicate that the data can never be averaged, and nearest neighbor selection should be done or an error thrown.

CDAWeb has metadata for averaging, https://spdf.gsfc.nasa.gov/istp_guide/vattributes.html#AVG_TYPE which should be reviewed as well.

jbfaden commented 4 years ago

See http://www-pw.physics.uiowa.edu/~jbf/das2/bugs/0002/product.15.mp4

cpiker commented 4 years ago

Nice video.
Though I don't think the metadata listed above would have helped much with the demonstrated effect, I can't think of any problems with associated with adding these. I presume we'd stick with the axis associated properties mentality so that these would look something like:

xAvgType="standard"
yAvgType="log"

etc.

jbfaden commented 4 years ago

Yes, the bug shown in the video was not caused by the averaging type, I was able to determine that yesterday. (It's because the interpolation is done first in Y and then in X, and there's a big gap in Y which is favored over a small gap in X.)

jbfaden commented 4 years ago

I hadn't thought about identifying the average type for each dimension. I think that's because we typically have a linear axis for X (time) and we typically don't reduce in Y (though Baptiste has got us thinking about this). But yes I think to match the other conventions of the stream, it would be for example, zAvgType='log' for flux or yAvgType='mod24' for localTime.

cpiker commented 4 years ago

Other string properties have separators between different pieces of information. Would you consider yAvgType="modulo | 24" to blend in? It's weird looking, but it does separate the fact that this is a modulo space from the size of the space. Also, pipes have been our goto field separators in many other string properties.

jbfaden commented 4 years ago

I don't think there are that many different average types, and I'm not sure I'd want to take on that complexity. (For example it's time24 for the transfer type, not time|24...)

cpiker commented 4 years ago

CDF has three different values that are just modulo (angle_degrees, angle_radians, angle_hour) keeping the size of the space separate makes the code more future proof.

time24 and asciiX are great examples of this problem. Many times in code I'm just shuffling input to output. Instead of having a clean simple way to figure out the width of the value type (say valsize="4", I have to know asciiX and timeX are size X, floats are 4, doubles are 8 and tt2000 is 8, etc. Any time a new type is added, the it triggers unnecessary changes and makes the code bigger than is has to be.

One of the fundamental rules of information system design is to keep atomic pieces of information separate, it's simpler over the long life cycle of the code.

jbfaden commented 4 years ago

I'd argue that using an enumeration is good, because you don't have to imagine (and properly code for) all possible cases. The alternative is to have angle_degrees, angle_radians, angle_hour which are fine names too, but you have to know these terms. mod360 mod2pi and mod24 are easily understood and consistent with other parts of the system.

cpiker commented 4 years ago

So are you saying that if a stream writer needs to put mod12 in their dataset they need to contact the Autoplot developer and ask for an update in order to properly render their stream?

jbfaden commented 4 years ago

Yes. The CDAWeb group has gotten by with the set they defined years ago. I think this is a bit of feature creep here. Maybe we should start with just "log" and "linear" and "mod2pi"...

jbfaden commented 4 years ago

Or maybe it should be "modOneTurn" and it uses the units to interpret the data.

cpiker commented 4 years ago

I know this won't get resolved today, and that's okay. To me it's perfectly simple to write an algorithm (modulo math) that takes a parameter (the wrap value). You have done this already anyway, so why not expose the functionality to the end user? It's trivial to implement and it might help them out, I know it would have been useful when working on Mars Express analysis.

Going philosophical for a moment, refactoring deployed interface definitions is very painful. Much, much worse then refactoring a single application's code. Improvement is a good thing, but must be done with the future in mind. There have been far too many short sighted decisions made with the das2 protocol but these are getting corrected and we shouldn't introduce any new ones. We shouldn't have to revise an ICD because someone has solar zenith angle data that wraps at 180 and the validator is flunking their stream.

Also, I think it would be good to avoid arguments from authority. It's important to justify implementations on their own merits. I'm sure CDAWeb doesn't think they've created the one perfect system. After all, their enumeration leaves out an important case for data I've personally worked with.

das-developers / das2docs

control of how data is reduced on das2server and in clients #2