glouppe commented 8 years ago

This is a proposal for an API on density and density ratio estimation. This RFC first presents the proposed classes and then illustrates how typical use cases would be solved using this API. It mainly takes inspiration from the API of Theano and of scikit-learn.

Distributions

Interface

The DistributionMixin API provides a limited interface for (continuous) N-D distributions. (For now, in order to keep a simple and unique interface, we can assume that everything is continuous.)

The parameters attributes stores the parameters of the distribution. They are provided either

as Theano (shared) tensor variables or raw values (in which case, shared tensor variables are automatically created), or
as Theano constant tensor variables. (In principle, Theano variables can be viewed as some kind of RooRealVar.)

The interface provides methods similar to scipy.stats rv_continuous for the pdf, cdf, likelihood, etc. They should also all accept a param_values dictionary, in order to use values in place of symbolic variables.

The interface also provides a scikit-learn API through fit, score, get_params and set_params, making it possible to plug a DistributionMixin object into a pipeline or into model selection tools.

class DistributionMixin(BaseEstimator)
    attributes:
        parameters  # parameters, as theano tensor variables

    methods:
        def __init__(params)
        def rvs(n_samples, random_state=None)  # draw random samples

        def pdf(X, param_values=None)          # probability density function
        def cdf(X, ...)                        # cumulative density function
        def nnlf(X, ...)                       # negative log-likelihood
        def ...

        def fit(X, y=None)  # fit parameters (e.g., using MLE)
        def score(X)        # proxy of the likelihood (e.g, neglecting normalization)

        def get_params  # some magic to implement scikit-learn parameters interface
        def set_params

Examples

Standard distributions

class Normal(DistributionMixin):
    def __init__(loc=0.0, scale=1.0):
        ...

# Raw or default values create shared variables for the parameters
rv = Normal()
rv.parameters
>>> {"loc": SharedVariable(value=0.0), "scale": SharedVariable(value=1.0)}

# Shared variables can also be provided directly
mu = theano.shared(10.0)
rv = Normal(loc=mu)
rv.parameters
>>> {"loc": SharedVariable(value=10.0), "scale": SharedVariable(value=1.0)}

# Method calls
X = some data
rv.pdf(X)
>>> 0.432432

# Fit and change parameters values
rv.fit(X)
rv.parameters
>>> {"loc": SharedVariable(value=42.0), "scale": SharedVariable(value=1.0)}
mu.get_value()
>>> 42.0

# Tensor variable without value
mu = T.fscalar()
rv = Normal(loc=mu)
rv.pdf(X)
>>> Error: mu is symbolic and has no value
# but input values can be set through `param_values`
rv.pdf(X, param_values={mu:10})
>>> 0.432432

# Rv implements the scikit-learn get/set_params interface to set/get values
rv.parameters
>>> {"loc": SharedVariable(value=42.0), "scale": SharedVariable(value=1.0)}
rv.set_params(loc=10)
rv.parameters
>>> {"loc": SharedVariable(value=10.0), "scale": SharedVariable(value=1.0)}
rv.get_params("loc")
>>> 10.0

Mixtures

When the likelihood is known

p0 = Normal(name="p0", loc=0.0, scale=1.0)
p0.parameters
>>> {"loc": SharedVariable(value=0.0), "scale": SharedVariable(value=1.0)}

p1 = Normal(name="p1", loc=-2.0, scale=0.5)
p = Mixture(components=[p0, p1], weights=[0.5, 0.5])
p.parameters
>>> {"weights": [SharedVariable(value=0.5), SharedVariable(value=0.5)],
     "p0__loc": SharedVariable(value=0.0),
     "p0__scale": SharedVariable(value=1.0),
     ...}

# Evaluate the p(x|theta) for the current parameter values
p.pdf(X)
>>> 0.342423

# Optimize parameters on data
# (i.e., weights and all others)
p.fit(X)

# Weights can themselves be expressions
g = theano.shared(0.0)
w0 = g ** 2
w1 = (1.0 - w0)
p = Mixture(components=[p0, p1], weights=[w0, w1])
p.pdf(X, {g: 10})
>>> 0.23213

# Fit with respect to specified variables
p.fit(X, [g])

Likelihood-free setting

p0 = Simulator(params)  # assume all parameters are constant
p1 = Simulator(params)  # same
p = Mixture(components=[p0, p1], weights=[0.5, 0.5])

# This cannot be evaluated
p.pdf(X)

# Optimize parameters on data (i.e., weights)
p.fit(X)     # For binary mixtures, this can be done with RegressorDensityRatio (see Radford paper)
p.score(X)

Comments

In addition to standard distributions, carl will provide a Mixture object for composing distributions with density of the type p(x) = w0*p0(x) + w1*p1(x) + .... Parameters will be accessible using a syntax similar to scikit-learn parameter syntax. E.g. p0__loc).
Kernel Density and Histograms can implement the same DistributionMixin interface. In this case, these objects will be fit from data like regular scikit-learn estimators.
In the likelihood-free setting, simulators can also implement DistributionMixin. In this case, pdf(X) is not implemented. Yet both fit and score could be still be provided to optimize its parameters.
A lot of this is already implemented in scipy.stats. Yet, composition mechanisms are missing, which may in the end change a lot of things (i.e., parameters should be optimized jointly and not invidually). The scipy interface is also not very convenient for n-dimensional distributions.
In parallel, it would be nice to provide a generic wrapper around RooFit implementing this API.
Density ratios

Interface

The DensityRatioMixin provides a scikit-learn estimator interface for computing density ratios. The ratio function can either be learned from data drawn from the distributions of distributions, using fit, or by providing DistributionMixin objects at initialization.

class DensityRatioMixin(BaseEstimator)
    methods:
        def __init__(params, numerator=None, denominator=None)
        def fit(X=None, y=None)
        def predict(X)    # predict ratio at X
        def score(X)

Examples

Regressor/Classifier-based density ratio

Density ratio estimators can be implemented directly using a classifier/regressor, assuming a good calibration. If the classifier/regressor is not good enough, one could set calibration="kde" (or some other density estimation technique) to estimate p(s(x)) instead of using s(x) directly (when calibration=None).

class RegressorDensityRatio(DensityRatioMixin):
    def __init__(base_estimator, numerator=None, denominator=None, calibration=None)

    def fit(X, y):
        # sample data from the distributions
        # fit the regressor
        # calibrate
        # ...

    def predict(X):
        s = self.regressor_.predict(X)
        return s / (1.0 - s)

Density ratio estimation from samples

# Draw samples from simulators
p0 = Simulator(params)
p1 = Simulator(params)
X = np.vstack((p0.rvs(n_samples=1000),
               p1.rvs(n_samples=1000)))
y = np.ones(2000)
y[:1000] = 0

# Fit ratio
from sklearn.neural_network import MLPRegressor

ratio = RegressorDensityRatio(base_estimator=MLPRegressor())
ratio.fit(X, y)
r = ratio.predict(X)  # return ratio at X

Density ratio estimation from `DistributionMixin` objects

The implementation of RegressorDensityRatio should check whether p0 or p1 are Mixture. In this case, the ratio can be decomposed internally.

p0 = Simulator(param=value1)
p1 = Simulator(param=value2)
ratio = RegressorDensityRatio(base_estimator=MLPRegressor(),
                              numerator=p0, denominator=p1)
ratio.fit()  # X=None, y=None
ratio.predict(X)

Density ratio estimation with parametrized classifiers

If p0 or p1 have non-constant internal parameters, then a parametrized classifier can/should be learned internally. As before, this could be handled by adding a param_values argument to RegressorDensityRatio defining which tensor variables are free and what is their domain. Then it remains open the question on how to learn such a parametrized classifier, but in terms of API I think this is sufficient.

mass = T.dscalar()

p0 = Foo(...)  # indirectly depending on `mass`
p1 = Bar(...)  # indirectly depending on `mass`

ratio = RegressorDensityRatio(base_estimator=MLPRegressor(),
                              numerator=p0,
                              denominator=p1,
                              param_values={mass:[1.0, 2.0, 3.0]})
ratio.fit()

ratio.predict(X, param_values={mass:2.5})

Note: this really calls for a proposal of interface for what we will refer as a "parametrized classiifer" and how to build and predict from it.

Comments

Density ratio estimators are meant to be plugged in some implementations of DistributionMixin in order to smartly compute DistributionMixin.score, e.g. in the case of Mixture defined on simulator components (see example above).
Straight density ratio estimation algorithms can be implemented using this interface (e.g. Vapnik, Sugiyama, etc).

glouppe commented 8 years ago

CC @cranmer @jgpavez: This is still a draft, but please comment if you have questions, suggestions or critiques. For sure, this is also meant to evolve as I start coding things and realize I was wrong :)

cranmer commented 8 years ago

Hi, this looks like a great start. First reactions:

Distribution or Density? It's a DensityRatio, so maybe each object should be a Density.
I would call them parameters instead of hyper-parameters in this case. These are the lowest-level parameters for a parametrized distribution p(x|theta).
we need an interface to evaluate the estimated density ratio for specified values of the parameters
should there be both a Simulator and a BatchSampler or something similar, where the first can generate samples on the fly and the second is using a fixed set of pre-computed examples?
There are a few notions of fit:
- train classifier or other DensityRatioMixin
- density estimation for classifier score (if calibrate==True)
- fit the parameters of the resulting Distribution / model
- We should be able to support the case that the parameters of mixture coefficients are shared with parameters of one of the Distributions, and this will need to be connected to the parametrized classifier as well.
- If calibrate==True, we should be able customize the density estimation technique

cranmer commented 8 years ago

wondering if RandomStreamsBase of theano is relevant for the Simulator and/or BatchSampler http://deeplearning.net/software/theano/library/tensor/shared_randomstreams.html#libdoc-tensor-shared-randomstreams

glouppe commented 8 years ago

I just updated the RFC. There are quite significant changes with respect to the first iteration. The main one is the use of (Theano) variables at the core of the interface to allow for composition and dependencies.

glouppe commented 8 years ago

Distribution or Density? It's a DensityRatio, so maybe each object should be a Density.

I renamed "Distribution" to "Random Variable", to emphasize that these objects represent more than just a probability density function.

I would call them parameters instead of hyper-parameters in this case. These are the lowest-level parameters for a parametrized distribution p(x|theta).

Changed.

we need an interface to evaluate the estimated density ratio for specified values of the parameters

This is now possible through the param_values argument, used for specifying values to symbolic variables.

should there be both a Simulator and a BatchSampler or something similar, where the first can generate samples on the fly and the second is using a fixed set of pre-computed examples?

We could have that indeed.

There are a few notions of fit:

train classifier or other DensityRatioMixin density estimation for classifier score (if calibrate==True) fit the parameters of the resulting Distribution / model

In my mind, for a regressor-based implementation of a density ratio estimator, training the classifier + its calibration should happen in DensityRatio.fit. Then, this (parametrized) object can be used in further step e.g. to fit the parameters of a distribution.

We should be able to support the case that the parameters of mixture coefficients are shared with parameters of one of the Distributions, and this will need to be connected to the parametrized classifier as well.

If think this should be possible by reusing the same theano variable object, though this makes me notice that the current proposal lacks the notion of "parametrized classifier". I'll think about that.

If calibrate==True, we should be able customize the density estimation technique

Agreed, I changed the interface to accommodate for that.

glouppe commented 8 years ago

I think I now have a good vision of the requirements. I will try to produce in the next days/weeks a minimally working concrete implementation of this API, from which it will then be easier to discuss and reason.

glouppe commented 8 years ago

Note for my future self: RegressorDensityRatio should also provide options for controlling the number of training samples (for fitting the classifier) and the number of calibration samples.

cranmer commented 8 years ago

Distribution or Density? It's a DensityRatio, so maybe each object should be a Density. I renamed "Distribution" to "Random Variable", to emphasize that these objects represent more than just a probability density function.

My understanding is that this interface is or the distribution (either explicitly known or indirectly through a simulator) for random variable x given some parameters theta. eg. the "p" in p(x|theta). I would avoid confusing the distribution with the random variable. I like either density or distribution better.

glouppe commented 8 years ago

My understanding is that this interface is or the distribution (either explicitly known or indirectly through a simulator) for random variable x given some parameters theta. eg. the "p" in p(x|theta). I would avoid confusing the distribution with the random variable. I like either density or distribution bette

Sure no problem, I'll switch back to distribution.

glouppe commented 8 years ago

Closing since this has now been implemented. New issues should be opened for more specific concerns.

diana-hep / carl

[RFC] API proposal and use cases #2

Distributions

Interface

Examples

Standard distributions

Mixtures

When the likelihood is known

Likelihood-free setting

Comments

Density ratios

Interface

Examples

Regressor/Classifier-based density ratio

Density ratio estimation from samples

Density ratio estimation from `DistributionMixin` objects

Density ratio estimation with parametrized classifiers

Comments

diana-hep / carl

[RFC] API proposal and use cases #2

Distributions

Interface

Examples

Standard distributions

Mixtures

When the likelihood is known

Likelihood-free setting

Comments

Density ratios

Interface

Examples

Regressor/Classifier-based density ratio

Density ratio estimation from samples

Density ratio estimation from DistributionMixin objects

Density ratio estimation with parametrized classifiers

Comments

Density ratio estimation from `DistributionMixin` objects