ihincks commented 8 years ago

Just want to put this here, not necessarily because it is urgent, but because it has come up as a problem for a few people.

Models have a method called n_outcomes which specifies the number outcomes in an array of expparams. Sometimes models do not have a finite number of outcomes. For example, measurement of NV centers involves drawing Poisson random variates, which has an infinite but discrete number of outcomes. Or, measurements in NMR are continuous valued.

Correct me if I am wrong, searching all source files as of 2c86c947b3 for the string n_outcomes, the only file in which it is being used rather than being defined is in the updater code of smc.py.

ihincks commented 8 years ago

I guess it also comes up in risk and information gain, but every use still appears to be in smc.py.

cgranade commented 8 years ago

Agreed, this is a major concern moving forward, and one that we should definitely address. At the moment, I think the only place that n_outcomes is explicitly used is as you say, in calculating the Bayes risk and information gain for online experiment design. I think it should be fairly straightforward to come up with a better design for defining how utility functions should take expectation values; the primary difficulty is in making that kind of a design appropriately use existing models which define n_outcomes and is_n_outcomes_constant.

ihincks commented 8 years ago

Talking to Thomas today, it seems that he has basically already solved this issue in his branch continuous-outcomes. His solution is to subclass Model into ContinuousModel (should probably be renamed). This new model has properties that specify how many samples should be taken whenever an expectation value over data needs to be estimated. Whenever such an expectation is being calculated, whether or not the model has a finite number of outcomes is tested, and different things are done in each case.

The alternative is to always have expectation values calculated in this way, even if there are a small number of outcomes.

cgranade commented 8 years ago

I'll take a more detailed look at @whitewhim2718's branch soon, then. In the meantime, tagging @jarthurgross into this Issue, as the n_outcomes problem is relevant to his continuous-measurement filtering work as well.

cgranade commented 8 years ago

For concreteness, I can think of two main usecases that should be addressed by a new design:

Continuous-valued outcomes
Multinomial (vector-valued but discrete)
Poisson (semi-infinite but discrete)

ihincks commented 8 years ago

And don't forget discrete valued but with infinitely many outcomes, such as Poisson! This is mainly what I am interested in at the moment.

cgranade commented 8 years ago

Very good point, I've added that to the list as well, then.

taalexander commented 8 years ago

I'd like to say that while I think I have a solution. My branch Continuous-Outcomes is very experimental right now. One glaring issue is that I added a callback feature in SMC update to reselect the sampled outcomes at every update. Somehow I neglected to actually add this to the abstract continuous model, but added it to my implementation. I will clean this all up, but I may not have the time until next week. I also want to say that I am very much open minded on how to implement these enhancements, and am open to changes.

Additionally I have done some other development on the ExperimentDesigner class (among other things) in that branch, that is most likely not relevant to this issue and probably should be split out.

cgranade commented 8 years ago

Somewhat tangential, but what do people think on which version we should target with this? Personally, I think it makes sense to target a new QInfer version 1.1, so that we can get a stable version out there to build off of while working on ensuring that the new design suits all of our needs and is extensible enough to not cause another problem down the road. Pushing to 1.1 would also fit will with some ongoing work on offline experiment design, as v1.1 could then focus on generalized outcomes and experiment design as the two big new features. Thoughts?

taalexander commented 8 years ago

I agree that the release of v1.0 should be prioritized over this feature. I think it is going to take some discussion and consensus to implement, and realistically that will take at least a month. These changes should be able to be implemented without breaking anything, or at least very little (the current implementation I have is backwards compatible, but maybe not the best way to go if we think long term). I know I haven't been participating in the recent flurry of work (I haven't gotten to code at all in the last month) but I am looking forward to v1.0 to build off of.

scasagrande commented 8 years ago

Here's what I'd do, depending on what course of action you want for the implementation:

Implemented without breaking anything else: minor version bump, prioritise 1.0
Implemented with breaking other stuff, one of:
- Release version <1.0, such as 0.9, do all your breaking changes, release 1.0
- Release 1.0, do your breaking changes, release 2.0 (but that's relatively quick turn around on 1 -> 2, depending on how you feel about those kind of things).
- Release nothing, do your breaking changes, release 1.0

cgranade commented 8 years ago

There is one other place that the assumption of integer outcomes gets used, and that's in the perf_testing module. While not strictly related, much of the generalization discussed here would also need to take that into account.

ihincks commented 8 years ago

I am interested in this issue because my research currently involves the need for poisson outcomes; I am writing the model as this discussion takes place. However, I am calculating the risk using another language. So I don't currently need the risk or information gain functions, and I think in this case, I can just put nonsense in for n_outcomes and it should work. For this reason, I don't really care which version it ends up in, and for the sake of getting on with it and not dragging things down, it makes sense to leave it out of v1.0.

In terms of breaking changes, after having thought about it a bit, I think (but you never know until you try) that there are very uncompromising ways to fix this issue without breaking anything. The only thing I would be tempted to break right now is the class name of Model, to something like FiniteOutcomeModel, but this is pretty superficial.

ihincks commented 8 years ago

Given a discussion with @cgranade yesterday, these are the proposed structure changes to abstract_class (which contain some breaking changes) for version 1, to make future integration with feature-generalized-outcomes less invasive:

(I am writing this here so that any criticisms can be made before someone puts the much greater effort into a PR)

Simulatable gets a new property domain which is of a class inheriting from a new abstract class Domain. In particular, a Domain will have a np.dtype which will allow for fancy output types, like lists, and will be able to enumerate all possible outcomes for systems with finite numbers of outcomes. Note that dtype will not be allowed to depend on expparam in this scheme, unless modified -- this should perhaps be discussed.
The property n_outcomes will not depend on the domain, but will rather specify how many outcomes will result when a to-be-implemented-method marginalized_outcomes is called. This will typically be equal to the total number of outcomes if there are a tractable number of them.
Model will inherit from Simulatable, as it currently does, and introduce the notion of the likelihood function
FiniteOutcomeModel will inherit from Model and will differ from it in that it will implement the marginalized_outcomes method partly by enumerating over all possible outcomes.
DifferentiableModel will inherit from abc.ABCMeta and Model and a concrete subclass can optionally inherit from FiniteOutcomeModel

The idea is that functions like the bayes risk in smc.py can now depend only on marginalized_outcomes, thereby removing the necessity of being able to enumerate all possible outcomes and call the likelyhood on them. Indeed, a system only needs to be simulatable to estimate the bayes risk with a Monte Carlo scheme.

Prototypical scenarios to consider (add more if you have them):

Gaussian model, where an outcome can be any real number
Multinomial model, where an outcome is a tuple of non-negative integers with a constrained sum (np.dtype([('k', np.int, self.n_elements)]); as an aside, this type seems to require an excessive number of brackets and parens.)
Poisson model, where an outcome is a non-negative integer, but there are infinitely many possibilities

taalexander commented 8 years ago

I agree with everything said here. With regards to my branch which contains my current work on this issue, not many changes will have to be made to come in line with the above.

edit - I was mistaken on the scope of this Issue. I had not realized that these changes were in preparation of the later branch me and @ihincks hope to submit, and not the work itself. Regardless my original statement above still stands.

ihincks commented 7 years ago

71 resolves the preparation step.

QInfer / python-qinfer

Remove n_outcomes restriction #66

71 resolves the preparation step.