foamliu / Autoencoder

Convolutional Autoencoder with SetNet in PyTorch
Apache License 2.0
83 stars 27 forks source link

Leaking data in the max-pool indices #2

Open efirdc opened 3 years ago

efirdc commented 3 years ago

One RGBA pixel is 32 bits, so a 2x2 of pixels is 128 bits

Each max pool index stores 2 bits of data. The first convolutional block has 64 channels. So there are 2*64 = 128 bits of data in the max pooling indices for that block. Those get passed straight to the end of the network.

ramidzamzam commented 3 years ago

@efirdc can you please elaborate, I'm trying to fix and use this model. And BTW the input/output of each layer (up and down) makes sense to you?

efirdc commented 3 years ago

No unfortunately I would not recommend using this architecture. It is built more like a segmentation architecture than an autoencoder. 99% of the model will not be used since the image is passed straight through the max-pooling indices at the highest resolution. This is why the reconstruction is almost completely perfect, as the image was never encoded.

If you removed the max pooling indices then the model most likely will not work still because there are too many layers for an encoder. The image signal will be lost and it will be impossible to optimize.

ramidzamzam commented 3 years ago

@efirdc Yeah I was impressed by the quality of the output.