Create an Operator class for better organizing logic for specific operators in compound models

embray commented 9 years ago

Bear with me while I explain some background to this issue (the background itself could be split off into an issue of its own, even):

Background

A few weeks ago we had a telecon discussion the model discretization API ideas in astropy/astropy-api#13 by @patti.

As part of that discussion we considered several possible approaches to implementing convolution of one model with another. Right now one of the most straightforward approaches to this is a kind of "ConvolutionModel" that uses astropy.convolution.convolve in its evaluate() method, and would have a specific kernel and other convolution options attached to it for performing the evaluation.

However, as @cdeil brought up, the existence of separate Kernel classes (i.e. separate from the Model they're based on) is due to the current lack of a discretization interface baked directly into the existing Model interface). One hope is that this will be resolved, perhaps as part of the GSoC project, obviating the need for a separate "Kernel" class, so that Model objects can be used directly with convolve().

If that is resolved, then it opens the door to another possible implementation of model convolution that I suggested, where the convolution itself would be treated as an operator between two models, resulting in a compound model, much in the same way we treat model composition.

For example, currently we implement model composition like:

source = Box2D(1, 0, 0, 1, 1)
sink = Gaussian2D(1, 0, 0, 1, 1)
model = source | sink

the resulting model object is a compound model that takes input to the Box2D and outputs the result through the Gaussian2D.

A convolve operator could look something like:

source = Box2D(1, 0, 0, 1, 1)
kernel = Gaussian2D(1, 0, 0, 1, 1)
model = convolve(source, kernel)

Here we write the convolve(a, b) operator with function syntax rather than infix operator for a few practical reasons:

1) Python only has so many infix operators we can use, and there isn't a good one that obviously means convolution ('*' is already taken for multiplication) 2) We do want to be able to pass additional arguments to convolve, such as sampling options.

However, this would still be treated as a binary operator just as addition or multiplication, in that the result would be a compound model that takes input through the source and convolves it with the kernel. This combined object can be further combined with other models in a transformation pipeline, just as we currently do with compound models.

Challenges

I personally really like the convolve operator idea, and in the telecon there seemed to be some support for it as well. This of course does require the discretization support on models. If a model can't be discretized then I guess we don't allow convolving with it.

The other challenge is more of a refactoring issue. This is why I marked this issue "Affects-dev". The Operator class I'm going to propose would only be for internal use for now (though if well designed it could enable further generalization of the compound model framework to enable custom operators as well).

The problem is that currently arithmetic operators (like +, *, etc.) as well as the special operators | and & are passed around the code and stored in the expression tree as just single character strings, like '+' for addition. This clearly won't do for something like "convolve". Nevermind that it requires more than a single character to specify--the "convolve" operator actually represents a whole class of operators with different possible options for boundary handling, sampling modes for the kernel, etc.

Proposal

We need an object that can be passed around and used in the ExpressionTree to represent the convolve operator along with any arguments passed to it. For this purpose I propose an Operator class that can be used to represent an operator (I'm currently undecided whether each operator should get its own (in most cases singleton) class, or if each operator should be an instance of a single Operator class). The Operator class would store a name for the operator (like '+' or 'convolve') and may also store zero or more options to that operator. In most cases, like the existing arithmetic operators, there would be no options; convolve on the other hand could have many options--each "convolve" variant would be a unique, but related operator.

Besides being able to represent a convolve operator this proposal has numerous possible advantages both for code refactoring and enhancing the usability of the framework:

We could store information about how to evaluate an operator directly in the Operator object. For example we could do away with the BINARY_OPERATORS table. Instead Operator('+', operator.add) automatically ties the name of the operator to its implementation.
If you look at how the operators in BINARY_OPERATORS are implemented, you'll note however that it doesn't just directly call (in the case of '+'), operator.add. Actually it wraps that in a more complicated function that takes a tuple consisting of the left operand's model evaluate, as well as its number of inputs and outputs, and a tuple consisting of the same for the right operand. This is an implementation detail related to implementation of more complicated operators like composition and join. A lot of this implementation detail could be baked directly into the Operator class though, so that it's easy to change in one place for all operators. This aspect requires a little more thinking though.
The Operator objects could also directly store the order of operations--precedence could be determined entirely by comparison of the Operator objects with each other.
When creating a compound model in _CompoundModelMeta._from_operator a good chunk of the code is validation code that checks whether the left and right operands' inputs and outputs are compatible, and this depends largely on which operator is in use. This validation could live with the individual Operator objects.
- Similarly, in #3852 I'm working on validation of compatible units when operating on models, which is highly dependent on the operator (for example the right-hand operand to ** had better have dimensionless output).
In addition to forward evaluation Operator objects could store how to evaluate the inverse of that operator (if it all)--almost all the code in _CompoundModel.inverse can live in the appropriate Operator objects so that each inverse doesn't have to be hard-coded in the inverse implementation.
As I already mentioned, if done right this could also make it easier to add new specialized operators. Of course this is super deep in the generalization weeds, but I think the resulting code could be easier to understand than the current admittedly dense compound model code.

cdeil commented 9 years ago

@embray – Thanks for thinking about this and writing up your thoughts on implementing this.

To be honest, I still don't fully understand some key details of the proposal how convolution should work in https://github.com/astropy/astropy-api/pull/13, namely for the PSF and source model:

when evaluation happens? calls evaluate or integrate? on which grid exactly? (related to the choice whether PSFs are Kernel objects or Model objects)
on which grid the convolution happens (full image or bounding box cutout)?

And @embray, I don't fully understand what you propose here and how it's similar or different from what's proposed there.

What would help me a lot would be im the two proposals are written down as a code example with annotations where computations happen. The application I have in mind is source morphology fitting of a small source on a large image, i.e. for this to be efficient, the source has to be evaluated on a bounding box part aligned with the grid in the large image, then PSF-convolved, i.e. the analytical PSF model also has to be evaluated on the same grid. Then the source parameters change, the bounding box can move, the source image has to be re-evaluated, but the PSF stays the same and should not be re-evaluated.

So when this is written down, is everything the same, except for the one line how the convolved model object is created? Or are the evaluation / grid and bounding box handling parts different in API or implementation?

This is what I care about, concerning how best to implement this in astropy.modeling, @embray, I would defer to your judgement, as long as it's not too complex, so that @patti can understand and implement it in a reasonable amount of time.

@adonath , @patti, @embray – Code examples? Thoughts?

embray commented 9 years ago

@cdeil I have to go to a meeting soon so I'll get back to you in more detail a little later. A lot of the questions you're asking though are sort of beyond the scope of this issue.

I guess, as I feared, I spent a little too much time in this issue talking about convolution and discretization. This problem was what initially motivated my idea for this Operator class for use in the compound model implementation. But as I note in the issue this led me to realizing more advantages to this beyond the specific use case I was trying to solve.

So my point is that maybe I should go ahead and create a separate issue where we can discuss this.

cdeil commented 9 years ago

For convolution / discretization, it's maybe best to continue discussion / fleshing out in https://github.com/astropy/astropy-api/pull/13 . Then you could keep this issue for your proposal to introduce the Operator class. For which, as you noticed, I have to useful feedback to give. :-)

mhvk commented 9 years ago

@embray - I've now read this twice and looked a bit at the present implementation, and it seems in general a good idea, but I feel rather out of my league, since I cannot really oversee how this will work.

However, I'll throw out another crazy (but nice!) idea on operations, which may be relevant on how you think about things: it seems to me that if we define __numpy_ufunc__ appropriately, we can nearly for free get all ufuncs covered as models, ie., one could have np.sin(Const1D+Scale) construct a new model class. (In this respect, I think it might be good to have F | G being equivalent to G(F) -- obviously, this needs some trickery in G.__new__...)

embray commented 9 years ago

@mhvk I like that __numpy_ufunc__ idea. I think it would actually be reasonably straightforward to do too, since it's just a matter of converting the relevant ufunc to a Model (using custom_model) and then, as you wrote, the rest would be equivalent to the existing compound model composition operator. Might also be worthwhile to have a special model class specific to models created from Numpy ufuncs. Not sure.

Still, that's pretty outside the scope of this issue so if you could open a new issue about that it would be great.

embray commented 9 years ago

@cdeil @mhvk Also it's perfectly fine to not have any particular feedback on this issue, since understanding the scope of it would require reading and understanding the model source code (although I do wish more people would take a crack at that).

This just came up in my thinking about the convolution discussion, and I realized that something like this would be necessary for @patti to implement it in the way I suggested (which I think can be ignored until and unless I've made changes like this to make it possible). We clearly need more written documentation though.

mhvk commented 9 years ago

OK, see new issue #3866

embray commented 9 years ago

Perhaps a related issue is that I need to write up better documentation of how compound models actually work. This will be all the more necessary if I expect users to be able to implement their own operators and evaluation contexts :)

I wrote up some notes on this the other day that could serve as a starting point. It's very technical of course and not for the average user, but good to know about.

adonath commented 9 years ago

+1 for a documentation for developers and advanced users. During the last days I've started to read the compound model source code and it's not easy to understand, because of the rather high degree of abstraction.

argriffing commented 9 years ago

Hi I'm just checking if you guys know about pyoperators. It implements a compound abstract operator framework with particular application to astronomy, and there is no good reason it had not been merged into scipy except for bad luck with timing of the grad student / postdoc -> industry churn. I don't think it has support for missing data or automatic differentiation. Sorry if this is too off-topic.

embray commented 9 years ago

@argriffing I don't think that's really on topic for this particular issue (which is more about discussion some internal refactoring ideas, and has more to do with generic operations than operators in the algebraic sense). Could you contact the astropy-dev mailing list with more detail about what this might be used for?

mhvk commented 9 years ago

@embray - from a quick look at pyoperators, it would seem it ran into (and perhaps solved) quite a few of the issues you are trying to solve here. In an abstract sense, modeling is trying to do something quite similar to pyoperators (if you see your inputs as vectors, and models as "matrices" that operate on them).

embray commented 9 years ago

There's definitely some similarity, but from a different approach and probably different motivators.

astropy / astropy