Some questions about implementation

beatriz-ferreira commented 5 years ago

Hi,

I've been using your code in some experiments. I have the following questions:

Applying your recent committed changes to the loss actually resulted in predicted values with weird (larger) ranges in my experiments, which were weirder to convert to an image. I had to "roll back" to the previous version... Have you noticed such an impact?
Shouldn't the last layer have a sigmoid as activation so that the output has values between 0 and 1? These values should be comparable to the input ones, which I think are rescaled to be between 0 and 1, I am correct? Does this affect the reconstruction loss?
Also, in some other implementations the common reconstruction loss is the mean squared error and not the mean absolute error. Do you use 'mae" for some reason?
This is an extra issue that I'm having. Have you been able to use the Tensorboard callback to log the losses and metrics? When trying to add the Tensorboard callback I get an error which I think is because the ae model is made of two models, and thus internally has more than one loss. I get the following error: line 1050, in _write_custom_summaries summary_value.simple_value = value.item() ValueError: can only convert an array of size 1 to a Python scalar I could not find a solution yet..!
Minor detail: Why changing the stddev to its absolute value? Can it ever be negative?!

I'm sorry for the long text and for raising all these issues, but I think they may be relevant for more users too!

Thank you in advance!

alecGraves commented 5 years ago

The last commit changed the applied value of beta loss to be summed instead of averaged over the values in the latent space, which I think is what is done in other implementations. This greatly increases beta's contribution to the gradient.
Changing the output to sigmoid would force the output to the desired range, so it is probably a good idea. Without that, the problem is likely much more difficult for the network to learn. I will test out this change.
I am using the mean absolute error / L1 distance because that is what was used in the cyclegan paper, and I just remembered that as I was making this.
I have not tried to use tensorboard with this system yet. Post something if you figure it out! I am interested what the solution could be.
~~I made this change mostly because negative std deviation did not make sense to me. And I am pretty sure it would break the loss function (https://github.com/alecGraves/BVAE-tf/issues/4)~~ (see below)

Thanks for the questions 😄

alecGraves commented 5 years ago

Update: the variable named stddev (which was the output of the previous layer) actually represents log variance, which can be negative. I corrected the variable name and undid the abs in https://github.com/alecGraves/BVAE-tf/commit/810506b1a142da49b3cc7eddcc4bb32856d5e51c
1. This is also kinda a better resolution to #4

beatriz-ferreira commented 5 years ago

Thank you for your reply and updates! I'm going to test the refactored version and I'll let you know if something changes on my experiments. I saw you added a tanh activation. I'll also let you know if I figure something out regarding the use of tensorflow.

Please let me know if you happen figure something out too :)

Thank you

beatriz-ferreira commented 5 years ago

Hi!

I've tested your refactored version with my experiments. Results are different! For the better since I am able to get better reconstructions. Cool, thank you!

Just a question: is there any difference in feeding the auto-encoder with a range [-1,1] like you do, or feed the images in the range [0,1]? I'm using the second option and everything looks fine. The auto-encoder should adapt to the range (sampling layer adapts to any distribution), correct? The only thing I think I should change is the final activation layer to a sigmoid so that my outputs are also in the range [0,1]. The loss function should be the same?

Thank you again!

alecGraves commented 5 years ago

Yes, the network should adapt to the different range without a problem. Changing the output layer to sigmoid would probably help the network because you are constraining the output to the desired range.

alecGraves / BVAE-tf

Some questions about implementation #8