Open ferrine opened 8 years ago
Great!
It would fit neatly into Parmesan so you are more than welcome to make a pull request. We are also a little bit more relaxed about catering for special layer needs if you need some special feature to make it work?
I hope I'll not need any special features to make it work. BTW there is still one problem: I have no idea how to get deterministic output after NN is constructed
Ok, let me know if you encounter any problems with parmesan.
I have no idea how to get deterministic output after NN is constructed
Can't you just set the deterministic flag to True?
I decorate add_param
in the Layer as following
class NormalQ(object):
"""Helper class for providing logics of initializing
random variable distributed like
N(mean, (log(1+exp(rho))^2)
with user defined prior
where `mean`, `rho` are variational params fitted while training
Parameters
----------
log_prior : callable - user defined prior
"""
def __init__(self, log_prior=prior(log_normal, 0, 1)):
self.log_prior = log_prior
def __call__(self, layer, spec, shape, **tags):
"""
Parameters
----------
layer : wrapped layer instance
shape : tuple of int
a tuple of integers representing the desired shape
of the parameter tensor.
tags : See :func:`lasagne.layers.base.Layer.add_param`
for more information
spec : Theano shared variable, expression, numpy array or callable
Initial value, expression or initializer for the embedding
matrix. This should be a matrix with shape
``(input_size, output_size)``.
See :func:`lasagne.utils.create_param` for more information.
.. Note
can also be a dict of same instances
``{'mu': spec, 'rho':spec}``
to avoid default rho initialization
Returns
-------
Theano tensor
"""
# case when user leaves default init specs
if not isinstance(spec, dict):
spec = {'mu': spec}
# important!
# we declare that params we add next
# are the ones we need to fit the distribution
# they are variational
tags['variational'] = True
rho_spec = spec.get('rho', lasagne.init.Normal(1))
mu_spec = spec.get('mu', lasagne.init.Normal(1))
rho = layer.add_param(rho_spec, shape, **tags)
mean = layer.add_param(mu_spec, shape, **tags)
e = layer.acc.srng.normal(shape, std=1)
# [reparametrization trick](https://www.reddit.com/r/MachineLearning/
# comments/3yrzks/eli5_the_reparameterization_trick/)
# so any time we add param we apply reparametrization
# but it's done in the __init__ and the only way I see to get
# deterministic output is to replace `e` with `0` in the graph,
# it's gonna be tricky
W = mean + T.log1p(T.exp(rho)) * e
q_p = self.log_posterior_approx(W, mean, rho) - self.log_prior(W)
layer.acc.add_cost(q_p)
return W
@staticmethod
def log_posterior_approx(W, mean, rho):
return log_normal3(W, mean, rho)
def bbpwrap(approximation=NormalQ()):
def decorator(cls):
def add_param_wrap(add_param):
@wraps(add_param)
def wrapped(self, spec, shape, name=None, **tags):
# we should take care about some user specification
# to avoid bbp hook just set tags['variational'] = True
if tags.get('variational', False):
return add_param(self, spec, shape, name, **tags)
else:
# they don't need to be regularized, strictly
tags['regularizable'] = False
param = self.approximation(self, spec, shape, **tags)
return param
return wrapped
def init_wrap(__init__):
@wraps(__init__)
def wrapped(self, acc, *args, **kwargs):
self.acc = acc # type: parmesan.utils.Accumulator
__init__(self, *args, **kwargs)
return wrapped
cls.approximation = approximation
cls.add_param = add_param_wrap(cls.add_param)
cls.__init__ = init_wrap(cls.__init__)
return cls
return decorator
Another Idea is to create such params with special init.SomeClass thing that will return an expression that is already supported in lasagne
can you give a bit of background of what you wan't to achieve ?
I want to make a a thing that will make easy creation of variational topping. Suppose we have a wide range of Layers i.e. all Lasagne layers. All of them treat weights as constants. I want to get a tool that will map a non bayes layer to bayes one. The only thing that should be changed is how we create and train a weight. Creation should be somehow modified and training comes out of the box from adam optimizer in lasagne. UPD Lasagne doesn't support creation theano expression, only shared, from callable, I hope it is possible to solve with a PR if needed
ok i get it (and sounds cool :) )
what is your problem specifically with getting a deterministic output. Do you want to get the output using the posterior mode of the weights instead of sampling the weights?
Yes, current implementation supports only sampling to get some kind of deterministic output(it will be just more stable), it will be prediction posterior mean. But in real tasks i.e. production this approach is too slow and we can consider using prediction based on weights from q_w mean or mode.
When I do this trick with overriding add_param
and __init__
I "say" that RandomStream is a part of my computational graph before I call get_output_for
. That's why I can't pass deterministic=True
to get_output_for
.
@casperkaae BTW we need to have a closer look at PyMC3-Lasagne bridge example. They do similar things just inline using lasagne API. That's what we need, exactly. I've seen this example some time ago, but thought that there were some restrictions. Having closer look into source code I got that the core thing why it should work are weights, initialized with pymc api and wrapped with Model with statement that will do all dirty staff. I'm really impressed.
Hi again!
I've finished the work on my topping. It seems to be flexible. I opened a PR in lasagne but think that here is better place to contribute to.