casperkaae / parmesan

Variational and semi-supervised neural network toppings for Lasagne
Other
208 stars 31 forks source link

Bayes by backprop #40

Open ferrine opened 8 years ago

ferrine commented 8 years ago

Hi again!

I've finished the work on my topping. It seems to be flexible. I opened a PR in lasagne but think that here is better place to contribute to.

casperkaae commented 8 years ago

Great!

It would fit neatly into Parmesan so you are more than welcome to make a pull request. We are also a little bit more relaxed about catering for special layer needs if you need some special feature to make it work?

ferrine commented 8 years ago

I hope I'll not need any special features to make it work. BTW there is still one problem: I have no idea how to get deterministic output after NN is constructed

casperkaae commented 8 years ago

Ok, let me know if you encounter any problems with parmesan.

I have no idea how to get deterministic output after NN is constructed

Can't you just set the deterministic flag to True?

ferrine commented 8 years ago

I decorate add_param in the Layer as following

class NormalQ(object):
    """Helper class for providing logics of initializing
    random variable distributed like
        N(mean, (log(1+exp(rho))^2)
    with user defined prior
    where `mean`, `rho` are variational params fitted while training

    Parameters
    ----------
    log_prior : callable - user defined prior
    """
    def __init__(self, log_prior=prior(log_normal, 0, 1)):
        self.log_prior = log_prior

    def __call__(self, layer, spec, shape, **tags):
        """

        Parameters
        ----------
        layer : wrapped layer instance
        shape : tuple of int
                a tuple of integers representing the desired shape
                of the parameter tensor.
        tags : See :func:`lasagne.layers.base.Layer.add_param`
               for more information
        spec : Theano shared variable, expression, numpy array or callable
               Initial value, expression or initializer for the embedding
               matrix. This should be a matrix with shape
                ``(input_size, output_size)``.
               See :func:`lasagne.utils.create_param` for more information.
               .. Note
                    can also be a dict of same instances
                    ``{'mu': spec, 'rho':spec}``
                    to avoid default rho initialization

        Returns
        -------
        Theano tensor
        """
        # case when user leaves default init specs
        if not isinstance(spec, dict):
            spec = {'mu': spec}
        # important!
        # we declare that params we add next
        # are the ones we need to fit the distribution
        # they are variational
        tags['variational'] = True

        rho_spec = spec.get('rho', lasagne.init.Normal(1))
        mu_spec = spec.get('mu', lasagne.init.Normal(1))

        rho = layer.add_param(rho_spec, shape, **tags)
        mean = layer.add_param(mu_spec, shape, **tags)

        e = layer.acc.srng.normal(shape, std=1)
        # [reparametrization trick](https://www.reddit.com/r/MachineLearning/
        # comments/3yrzks/eli5_the_reparameterization_trick/)
        # so any time we add param we apply reparametrization
        # but it's done in the __init__ and the only way I see to get 
        # deterministic output is to replace `e` with `0` in the graph, 
        # it's gonna be tricky
        W = mean + T.log1p(T.exp(rho)) * e

        q_p = self.log_posterior_approx(W, mean, rho) - self.log_prior(W)
        layer.acc.add_cost(q_p)
        return W

    @staticmethod
    def log_posterior_approx(W, mean, rho):
        return log_normal3(W, mean, rho)

def bbpwrap(approximation=NormalQ()):
    def decorator(cls):
        def add_param_wrap(add_param):
            @wraps(add_param)
            def wrapped(self, spec, shape, name=None, **tags):
                # we should take care about some user specification
                # to avoid bbp hook just set tags['variational'] = True
                if tags.get('variational', False):
                    return add_param(self, spec, shape, name, **tags)
                else:
                    # they don't need to be regularized, strictly
                    tags['regularizable'] = False
                    param = self.approximation(self, spec, shape, **tags)
                    return param
            return wrapped
        def init_wrap(__init__):
            @wraps(__init__)
            def wrapped(self, acc, *args, **kwargs):
                self.acc = acc  # type: parmesan.utils.Accumulator
                __init__(self, *args, **kwargs)
            return wrapped

        cls.approximation = approximation
        cls.add_param = add_param_wrap(cls.add_param)
        cls.__init__ = init_wrap(cls.__init__)

        return cls
    return decorator

Another Idea is to create such params with special init.SomeClass thing that will return an expression that is already supported in lasagne

casperkaae commented 8 years ago

can you give a bit of background of what you wan't to achieve ?

ferrine commented 8 years ago

I want to make a a thing that will make easy creation of variational topping. Suppose we have a wide range of Layers i.e. all Lasagne layers. All of them treat weights as constants. I want to get a tool that will map a non bayes layer to bayes one. The only thing that should be changed is how we create and train a weight. Creation should be somehow modified and training comes out of the box from adam optimizer in lasagne. UPD Lasagne doesn't support creation theano expression, only shared, from callable, I hope it is possible to solve with a PR if needed

casperkaae commented 8 years ago

ok i get it (and sounds cool :) )

what is your problem specifically with getting a deterministic output. Do you want to get the output using the posterior mode of the weights instead of sampling the weights?

ferrine commented 8 years ago

Yes, current implementation supports only sampling to get some kind of deterministic output(it will be just more stable), it will be prediction posterior mean. But in real tasks i.e. production this approach is too slow and we can consider using prediction based on weights from q_w mean or mode.

When I do this trick with overriding add_param and __init__ I "say" that RandomStream is a part of my computational graph before I call get_output_for. That's why I can't pass deterministic=True to get_output_for.

ferrine commented 8 years ago

@casperkaae BTW we need to have a closer look at PyMC3-Lasagne bridge example. They do similar things just inline using lasagne API. That's what we need, exactly. I've seen this example some time ago, but thought that there were some restrictions. Having closer look into source code I got that the core thing why it should work are weights, initialized with pymc api and wrapped with Model with statement that will do all dirty staff. I'm really impressed.