alecGraves / BVAE-tf

Disentangled Variational Auto-Encoder in TensorFlow / Keras (Beta-VAE)
The Unlicense
54 stars 13 forks source link

why is the in_train_phase not working #9

Open alecGraves opened 4 years ago

alecGraves commented 4 years ago

https://github.com/alecGraves/BVAE-tf/blob/c26a21dbb8924e3779183f5825832c5ec04e6652/bvae/sample_layer.py#L91

in_train_phase should call the reparameterization function when K.backend is in its training phase.... It does not appear to be running at all.

alecGraves commented 4 years ago

It appears to work in later tf versions......

Brandt-J commented 3 years ago

Iam experiencing the same issue, directly in your repository with tensorflow 1.4, but also in an own implementation (using Dense instead of convolutional layers), running with tensorflow 2.3.0. Apparently, mostly calls the SamplingLayer with tensors of Shape (None, latentSize), which just fully bypasses the sampling and reparametrization function. That line here: https://github.com/alecGraves/BVAE-tf/blob/c26a21dbb8924e3779183f5825832c5ec04e6652/bvae/sample_layer.py#L70 However, when the length of my input tensor is of a multiple of the batch size, then the Sampling Layer is called with tensors of shape (batchsize, latentSize), THEN it goes into the sampling but then I get different errors there.. But before investigating these further, shoudln't it mostly go into the sampling and reparametrization trick section??

sokrypton commented 3 years ago

I believe, if you wrap the K.in_train_phase() function with a keras.layers.Lambda() function, that should fix the bug.

I noticed in new tensorflow version, when in eager mode, the train mode is only evaluated inside explicit layers. Wrapping a function in a Lambda layer should do the trick.

Brandt-J commented 3 years ago

Hmm, that changes a bit what happens, but does not solve my initial issue, let me paraphrase differently. It could be that I am just using it wrong, Iam pretty new to tensorflow :/

The data I want to train on consists of a set of 1-dimensional spectral data with, e.g., 1024 spectral frequencies, so my input_shape is (1024) - no additional dimension as I am just working with some densely connected layers. So, I might have for example 800 spectra for training, which gives me an 800x1024 input tensor. Wheeen I now want to fit the model to that data, different things happen depending on the batch_size, or, more presicely, the relation from batch_size and dataset size (spectra number). If the dataset size is, e.g. (320 x 1024) I am having a multiple of the batch_size (32) and the sampling layer actually proceeds the random sampling (as it receives a 32x1024 tensor as input) . However, for all the cases when my spectra number (i.e. training dataset size) is NOT a multiple of the batch_size, the shape of the tensors going through the network is (None, 1024). Then in the sampling layer, the "# trick to allow setting batch at train/eval time" with the above line gets triggered and the sampling layer just returns the (mean + 0*logvar), so without doing the random sampling(?).

So, as said, probably Iam just doing it wrong and I should always pass in datasets that have a multiple of the batch_size in their first dimension? In other cases of simple feed-forward networks that didn't seem to make any difference, but obviously it does here.. I don't really understand at what situations the "# trick to allow setting batch at train/eval time" should be executed and when not..