I do not think the capacity argument works

alecGraves commented 5 years ago

What was past me thinking? I do not know 😄

 if self.reg == 'bvae':
            # kl divergence:
            latent_loss = -0.5 * K.mean(1 + stddev
                                - K.square(mean)
                                - K.exp(stddev), axis=-1)
            # use beta to force less usage of vector space:
            # also try to use <capacity> dimensions of the space:
            latent_loss = self.beta * K.abs(latent_loss - self.capacity/self.shape.as_list()[1])
            self.add_loss(latent_loss, x)

I just randomly subtract a constant from my loss?

This is more like it:

if self.reg == 'bvae':
            # kl divergence:
            latent_losses = -0.5 * (1 + stddev
                                - K.square(mean)
                                - K.exp(stddev))
            # use beta to force less usage of vector space:
            # also try to use <capacity> dimensions of the space:
            bvae_weight = self.beta * K.ones(shape=(self.shape.as_list()[1]-self.capacity))
            if self.capacity > 0:
                vae_weight = K.ones(shape=(self.capacity))
                bvae_weight = K.concatenate([vae_weight, bvae_weight], axis=-1)
            latent_loss = K.abs(K.mean(bvae_weight*latent_losses, axis=-1))

            self.add_loss(latent_loss, x)

beatriz-ferreira commented 5 years ago

Hi,

I'm using your implementation of the beta-VAE. You haven't committed this change of the loss function to your code. Should we use the new one you proposed or the old one? Have you tested that the new one works and does what it is supposed to do?

Thank you in advance!

alecGraves commented 5 years ago

I have not tested anything. I would recommend leaving the capacity argument at the default value of zero for now.

beatriz-ferreira commented 5 years ago

Ok, thank you for your reply! In the beta-VAE paper (https://openreview.net/references/pdf?id=Sy2fzU9gl) there's no capacity parameter. They just play with beta and the size of the latent representation, am I right?

Another suggestion would be to normalize the beta, as they do in the paper, to take into account the dimension of the input and dimension of the latent space :-)

alecGraves commented 5 years ago

@beatriz-ferreira Thank you for the suggestion, I missed that detail from the paper, but normalization makes a lot of sense.

alecGraves commented 5 years ago

I removed the capacity argument from the layer. The idea behind the capacity argument is to guide the model to learn a representation with a specific number of distributions, but still allow the model to expand the number used if necessary. It is kinda silly, as a standard VAE allows one to specify the number of distributions to use directly.

I removed this (nonworking) argument from the sampling layer in a recent commit.

Closing...

alecGraves / BVAE-tf

I do not think the capacity argument works #3