Closed pranjal-joshi-cc closed 1 year ago
Hey @pranjal-joshi-cc,
Thanks for opening the issue. Can you be a bit more specific or share some code? Are you following an example notebook or page in the docs? If your looking at something like this then I think it should be enough to change the InputLayer
line in the model definition to:
encoder_net = tf.keras.Sequential([
InputLayer(input_shape=(512, 512, 3)), # <-- CHANGE THE SHAPE HERE
Conv2D(64, 4, strides=2, padding='same', activation=tf.nn.relu),
...
Dense(encoding_dim,)
])
and any other references to that shape throughout? Hard to help without more details though?
Hi @mauicv and Thanks for the quick reply. I am using VAE Outlier detection method and I've tried changing the input_shape as suggested but it throws following error with the following code:
RESOLUTION = 512
IMAGE_SIZE = (RESOLUTION, RESOLUTION)
IMAGE_SHAPE = (RESOLUTION, RESOLUTION, 3)
...
...
latent_dim = 1024
encoder_net = tf.keras.Sequential(
[
InputLayer(input_shape=IMAGE_SHAPE),
Conv2D(32, 3, strides=2, padding='same', activation=tf.nn.relu),
Conv2D(128, 3, strides=2, padding='same', activation=tf.nn.relu),
Conv2D(512, 3, strides=2, padding='same', activation=tf.nn.relu)
])
decoder_net = tf.keras.Sequential(
[
InputLayer(input_shape=(latent_dim,)),
Dense(4*4*128),
Reshape(target_shape=(4, 4, 128)),
Conv2DTranspose(256, 3, strides=2, padding='same', activation=tf.nn.relu),
Conv2DTranspose(64, 3, strides=2, padding='same', activation=tf.nn.relu),
Conv2DTranspose(3, 3, strides=2, padding='same', activation='sigmoid')
])
od = OutlierVAE(threshold=.015, # threshold for outlier score
score_type='mse', # use MSE of reconstruction error for outlier detection
encoder_net=encoder_net, # can also pass VAE model instead
decoder_net=decoder_net, # of separate encoder and decoder
latent_dim=latent_dim,
samples=2)
od.fit(x_train,
loss_fn=elbo,
cov_elbo=dict(sim=.05),
epochs=100,
verbose=True)
the fit
method throws following error:
---------------------------------------------------------------------------
InvalidArgumentError Traceback (most recent call last)
/Users/pranjaljoshi/Documents/CC Projects/Goodpack/Alibi/alibi_test.ipynb Cell 8' in <cell line: 2>()
[1](vscode-notebook-cell:/Users/pranjaljoshi/Documents/CC%20Projects/Goodpack/Alibi/alibi_test.ipynb#ch0000008?line=0) # train
----> [2](vscode-notebook-cell:/Users/pranjaljoshi/Documents/CC%20Projects/Goodpack/Alibi/alibi_test.ipynb#ch0000008?line=1) od.fit(x_train,
[3](vscode-notebook-cell:/Users/pranjaljoshi/Documents/CC%20Projects/Goodpack/Alibi/alibi_test.ipynb#ch0000008?line=2) loss_fn=elbo,
[4](vscode-notebook-cell:/Users/pranjaljoshi/Documents/CC%20Projects/Goodpack/Alibi/alibi_test.ipynb#ch0000008?line=3) cov_elbo=dict(sim=.05),
[5](vscode-notebook-cell:/Users/pranjaljoshi/Documents/CC%20Projects/Goodpack/Alibi/alibi_test.ipynb#ch0000008?line=4) epochs=100,
[6](vscode-notebook-cell:/Users/pranjaljoshi/Documents/CC%20Projects/Goodpack/Alibi/alibi_test.ipynb#ch0000008?line=5) verbose=True)
[8](vscode-notebook-cell:/Users/pranjaljoshi/Documents/CC%20Projects/Goodpack/Alibi/alibi_test.ipynb#ch0000008?line=7) # save the trained outlier detector
[9](vscode-notebook-cell:/Users/pranjaljoshi/Documents/CC%20Projects/Goodpack/Alibi/alibi_test.ipynb#ch0000008?line=8) save_detector(od, filepath)
File ~/miniforge3/envs/alibi/lib/python3.8/site-packages/alibi_detect/od/vae.py:133, in OutlierVAE.fit(self, X, loss_fn, optimizer, cov_elbo, epochs, batch_size, verbose, log_metric, callbacks)
130 kwargs['loss_fn_kwargs'] = {cov_elbo_type: tf.dtypes.cast(cov, tf.float32)}
132 # train
--> 133 trainer(*args, **kwargs)
File ~/miniforge3/envs/alibi/lib/python3.8/site-packages/alibi_detect/models/tensorflow/trainer.py:85, in trainer(model, loss_fn, x_train, y_train, dataset, optimizer, loss_fn_kwargs, preprocess_fn, epochs, reg_loss_fn, batch_size, buffer_size, verbose, log_metric, callbacks)
83 if isinstance(loss_fn, Callable): # type: ignore
84 args = [y, y_hat] if tf.is_tensor(y_hat) else [y] + list(y_hat)
---> 85 loss = loss_fn(*args)
86 else:
87 loss = 0.
File ~/miniforge3/envs/alibi/lib/python3.8/site-packages/alibi_detect/models/tensorflow/losses.py:44, in elbo(y_true, y_pred, cov_full, cov_diag, sim)
...
7105 def raise_from_not_ok_status(e, name):
7106 e.message += (" name: " + name if name is not None else "")
-> 7107 raise core._status_to_exception(e) from None
InvalidArgumentError: Incompatible shapes: [15,786432] vs. [15,3072] [Op:Sub]
Ah, sorry, I forgot you'll also need to ensure the decoder creates the correct shape of output. Basically what's happening is the encoder maps (15, 512, 512, 3)
to the latent space and the decoder maps the latent space to (15, 32, 32, 3)
and this causes the shape mismatch in the loss function. You'll have to change the decoder architecture to create the correct output shape. I'd also consider adding a few more convolutional and deconvolutional layers as well. Something like the following should work:
encoder_net = tf.keras.Sequential(
[
InputLayer(input_shape=IMAGE_SHAPE),
Conv2D(32, 4, strides=2, padding='same', activation=tf.nn.relu),
Conv2D(64, 4, strides=2, padding='same', activation=tf.nn.relu),
Conv2D(128, 4, strides=2, padding='same', activation=tf.nn.relu),
Conv2D(256, 4, strides=2, padding='same', activation=tf.nn.relu),
Conv2D(516, 4, strides=2, padding='same', activation=tf.nn.relu),
Conv2D(1024, 4, strides=2, padding='same', activation=tf.nn.relu),
])
decoder_net = tf.keras.Sequential(
[
InputLayer(input_shape=(latent_dim,)),
Dense(8*8*1024),
Reshape(target_shape=(8, 8, 1024)),
Conv2DTranspose(1024, 4, strides=2, padding='same', activation=tf.nn.relu),
Conv2DTranspose(516, 4, strides=2, padding='same', activation=tf.nn.relu),
Conv2DTranspose(256, 4, strides=2, padding='same', activation=tf.nn.relu),
Conv2DTranspose(128, 4, strides=2, padding='same', activation=tf.nn.relu),
Conv2DTranspose(64, 4, strides=2, padding='same', activation=tf.nn.relu),
Conv2DTranspose(32, 4, strides=2, padding='same', activation=tf.nn.relu),
Conv2DTranspose(3, 1, strides=1, padding='same', activation='sigmoid')
])
@mauicv
I've understood the encoder
part that through the strides
parameter, we control the dimensionality reduction and with encoder_net.summary()
we can see the size of last convolution operation i.e. N x N x Filters
.
However, is it necessary to always map the encoder into 32 x 32
for alibi-detect to work or the choice of autoencoder is purely arbitrary?
Also, Please explain how to calculate and reshape dense layers in decoder
net as its quite confusing for me.
decoder_net = tf.keras.Sequential(
[
...
Dense(8*8*1024),
Reshape(target_shape=(8, 8, 1024)),
...
])
How to determine the number of Dense units i.e. 8*8*1024
and how to determine the reshaping in the next layer?
@roshan-dadlaney
Hey @pranjal-joshi-cc,
I've understood the encoder part that through the strides parameter, we control the dimensionality reduction and with encoder_net.summary() we can see the size of last convolution operation i.e. N x N x Filters. However, is it necessary to always map the encoder into 32 x 32 for alibi-detect to work or the choice of autoencoder is purely arbitrary?
I'm not completely sure what you mean here? The choice of the autoencoder is arbitrary except that:
VAEOutlier
this really only applies to the decoder. It needs to ensure that the decoder maps from the latent space of size latent_dim
to the same shape as the original input image, so in your case (512, 512, 3)
.In terms of the output shape of the encoder, it doesn't really matter as long as the capacity is sufficient, basically that you don't reduce the dimensionality too much. For the architecture I provided above for instance we have:
encoder_net = tf.keras.Sequential(
[
InputLayer(input_shape=IMAGE_SHAPE),
Conv2D(32, 4, strides=2, padding='same', activation=tf.nn.relu),
Conv2D(64, 4, strides=2, padding='same', activation=tf.nn.relu),
Conv2D(128, 4, strides=2, padding='same', activation=tf.nn.relu),
Conv2D(256, 4, strides=2, padding='same', activation=tf.nn.relu),
Conv2D(516, 4, strides=2, padding='same', activation=tf.nn.relu),
Conv2D(1024, 4, strides=2, padding='same', activation=tf.nn.relu),
])
and the summary is:
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 256, 256, 32) 1568
conv2d_1 (Conv2D) (None, 128, 128, 64) 32832
conv2d_2 (Conv2D) (None, 64, 64, 128) 131200
conv2d_3 (Conv2D) (None, 32, 32, 256) 524544
conv2d_4 (Conv2D) (None, 16, 16, 516) 2114052
conv2d_5 (Conv2D) (None, 8, 8, 1024) 8455168
=================================================================
Total params: 11,259,364
Trainable params: 11,259,364
Non-trainable params: 0
_________________________________________________________________
So the output shape of the encoder_net
is (8, 8, 1024)
. Note that the VAEOutlier
adds some Dense layers to the encoder_net
to transform the (8, 8, 1024)
output to the latent space of dimension 1024
where you've chosen latent_dim=1024
.
Also, Please explain how to calculate and reshape dense layers in decoder net as its quite confusing for me. How to determine the number of Dense units i.e.
8*8*1024
and how to determine the reshaping in the next layer?
The decoder_net
maps from the latent space of dimension 1024 (in our case) to the output shape (512, 512, 3)
. So it is going to take a vector of length latent_dim
. We want to transform this to a shape that can then easily be scaled up to (512, 512, 3)
. You can do this a number of ways but it's easiest if we set up the Conv2dTranspose
operation to double the size of the height and width at each layer of the network. The reason we choose 8*8*1024
is just that this can then be reshaped into (8, 8, 1024)
. We can then upscale this to obtain the output image by applying each of the transpose layers. For instance, given the architecture I suggested above:
latent_dim = 1024
decoder_net = tf.keras.Sequential(
[
InputLayer(input_shape=(latent_dim,)),
Dense(8*8*1024),
Reshape(target_shape=(8, 8, 1024)),
Conv2DTranspose(1024, 4, strides=2, padding='same', activation=tf.nn.relu),
Conv2DTranspose(516, 4, strides=2, padding='same', activation=tf.nn.relu),
Conv2DTranspose(256, 4, strides=2, padding='same', activation=tf.nn.relu),
Conv2DTranspose(128, 4, strides=2, padding='same', activation=tf.nn.relu),
Conv2DTranspose(64, 4, strides=2, padding='same', activation=tf.nn.relu),
Conv2DTranspose(32, 4, strides=2, padding='same', activation=tf.nn.relu),
Conv2DTranspose(3, 1, strides=1, padding='same', activation='sigmoid')
])
The latent vector of shape (1, 1024)
is mapped to a vector of shape (1, 8*8*1024)
which is then reshaped to (1, 8, 8, 1024)
and then upscaled by each of the transpose layers: (1, 8, 8, 1024) -> (1, 16, 16, 1024) -> (1, 32, 32, 516) -> (1, 64, 64, 256) -> (1, 128, 128, 128) -> (1, 256, 256, 64) -> (1, 512, 512, 32) -> (1, 512, 512, 3)
. So (8*8*1024)
is really chosen as a convenience in order to reshape the tensor. Typically we choose image height and width sizes to be powers of 2 just becuase it makes this operation of scaling up and down simpler but in general this doesn't have to be the case. The formula for the output size of a transpose convolution is documented here.
@pranjal-joshi-cc has @mauicv answered your question above? If so we shall close this issue 🙂
Thanks for confirming @pranjal-joshi-cc!
Can we use a custom input image shape while training? I am looking forward to set an input shape of
(512, 512, 3)
but anything else that(32, 32, 3)
throws a mismatch error. Can you explain how to determine the encoder and decoder network parameters? Thanks!