CDAT / vcs-js

3 stars 3 forks source link

Average and std on an axis #41

Closed James-Crean closed 6 years ago

James-Crean commented 6 years ago

We have a requested feature for vcdat for applying an average or std over an axis. See the issue in vCDAT here

Charles mentioned that loading a variable in uvcdat like this resulted in unwanted behavior. These functions could not be undone once loaded. Loading a variable with the latitude value averaged would mean that the user could not edit latitude again later. The variable would need to be deleted and loaded again from scratch.

To avoid this issue, perhaps we can implement this as an argument to the plot call? This way the vCDAT UI could keep track of the dimensions and transforms, and vcs-js would apply any transforms if the option was present.

vcs.plot(dataspec, method, template) -> vcs.plot(dataspec, method, template, options)

This is mostly brainstorming, so thoughts and alternative methods of implementation are very welcome.

scottwittenburg commented 6 years ago

@James-Crean I guess I misunderstood this morning what the issue was here. I assumed we needed some new function on the javascript side which would take a statistical method (e.g. avg, std) and a variable name, invoke one of those methods on the python side, and then return a number. But this sounds different, seeing as you're talking about the plot method.

Maybe if you can let me know precisely how you want to invoke this functionality from your end, that will help me understand what I need to do in vcs-js.

James-Crean commented 6 years ago

@doutriaux1 would be more knowledgable about how exactly these functions should be applied.

My general understanding is that averaging the longitude would result in a plot that depicts time/latitude instead of latitude/longitude.

The issue I was referring to above is that uvcdat used to apply this transform when loading the variable. This meant that if the user wanted to go from an averaged axis to a regular one again, they had to load the variable all over again because there was no way to undo it. My hope was that we could possibly avoid that particular issue when we implement it in vcdat.

doutriaux1 commented 6 years ago

@scottwittenburg @James-Crean basically here is how it works

under the hood here is what happens

var = f("myvariable", longitude=(start, end))
var = cdutil.averager(var,axis="(longitude)")
## For std use:
var = genutil.statistics.std(var,axis="(longitude)")
# if plotting then:
x.plot(var)
scottwittenburg commented 6 years ago

@doutriaux1 Just to be clear, are you saying this is what we should implement? If we do as you suggest in your snippet, will we face the issue that @James-Crean described, i.e. needing to delete and reload variables if they have had some statistic computed?

James-Crean commented 6 years ago

@doutriaux1 What are your thoughts on this? Should we make the users decide when loading the variable, or do we want to try and figure out a way to do the averaging without reloading the variable?

doutriaux1 commented 6 years ago

@James-Crean it used to be that the user had to decide when loading from file BUT we could select an existing variable and further subset/average it after it was loaded in memory. Not sure this is possible at the moment.

James-Crean commented 6 years ago

What happened to the axis and its slider after a user averaged it in UV-CDAT? Did the slider disappear?

In vcs-js Visualizer.py the plot code contains the following:

if ('operations' in varSpec):
                for op in varSpec['operations']:
                    if ('subRegion' in op):
                        kargs = op['subRegion']
                        var = var.subRegion(**kargs)
                    elif ('subSlice' in op):
                        kargs = op['subSlice']
                        # fill in None with begin and end of the current axis
                        for axis in kargs.keys():
                            values = kargs[axis]
                            newValues = values
                            axisIndex = var.getAxisIndex(axis)
                            if values[0] is None:
                                newValues[0] = 0
                            if values[1] is None:
                                newValues[1] = var.shape[axisIndex] - 1
                            kargs[axis] = slice(*newValues)
                        var = var.subSlice(**kargs)

We already have the mechanism, what if we add a new operation? The process would look like this:

If we decide to do it this way we should test the performance on high resolution files and 3d to make sure that performance is not unacceptably impacted. On the vCDAT side, we should keep in mind that the order of these operations matters, and that future features that require operations may necessitate rethinking how the user interface works to accommodate that. Currently the only operations we support are "subRegion" and "subslice". Are we able to assume that averaging an axis must, by definition, happen after these operations?