Linear regression with Automatic Relevance Determination

I'm trying to implement Linear regression with ARD. This is my current implementation

# initiate placeholders
X = tf.placeholder(tf.float32, [None, d])

# initiate the noise
sigma = ed.models.TransformedDistribution(
    distribution=ed.models.Normal(loc=0.0, scale=0.25),
    bijector=bijector.Exp())

# initiating the hyperprior
alpha = ed.models.TransformedDistribution(
    distribution=ed.models.Normal(loc=0.0, scale=1.0),
    bijector=bijector.Exp())

# initiating the priors
w = Normal(loc=tf.zeros(d), scale=tf.ones(d) * alpha)
b = Normal(loc=tf.zeros(1), scale=tf.ones(1))

# initiate the likelihood
y = Normal(loc=ed.dot(X, w) + b, scale=sigma * tf.ones(1))

# initiate the posteriors
qw = Normal(loc=tf.get_variable("qw/loc", [d]),
                 scale=tf.nn.softplus(tf.get_variable("qw/scale", [d])))

qb = Normal(loc=tf.get_variable("qb/loc", [1]),
                 scale=tf.nn.softplus(tf.get_variable("qb/scale", [1])))

qsigma = ed.models.TransformedDistribution(
    distribution=ed.models.Normal(loc=0.0, scale=0.25),
    bijector=bijector.Exp())

qalpha = ed.models.TransformedDistribution(
    distribution=ed.models.Normal(loc=0.0, scale=0.25),
    bijector=bijector.Exp())

# inference
inference = ed.KLqp({w: qw, b: qb, sigma: qsigma, self.alpha : self.qalpha },
                                 data={X: train_X, y: train_y})
inference.run(n_iter=500)

However this implementation have a very high error than the linear regression without ARD. It seems that I’m doing something wrong.

I have seen that TransformedDistribution is used to define the noise of the likelihood and the hyperpriors. What is the purpose of transformed distribution?
Can’t we define those without transformed distribution (similar to w and b)?
What am I doing wrong here? Can someone please help me to fix this model?

blei-lab / edward

Linear regression with Automatic Relevance Determination #910