Open wbrandenburger opened 5 years ago
ENCODER
Input: 256x256x3
Layer 1: Convolution with 32 filters, kernelsize 3, strides of 2, activation fuction Relu Result: 128x128x32
Layer 2: Convolution with 64 filters, kernelsize 3, strides of 2, activation fuction Relu Result: 64x64x64
Layer 3: Convolution with 64 filters, kernelsize 3, strides of 2, activation fuction Relu Result: 32x32x64
Layer 4: Flatten Result: 65 536
Layer 5: Dense to 256 with no activation function Result 256
Die Hälfte (128) von Layer5 wird als Mittelwert und die andere Hälfte als Standardabweichung genommen und damit aus der Normalverteilung 128 Werte gesampled: Result 128
DECODER
Input: 128
Layer1: Dense to 32^3 with activation function Relu Result: 32^3 = 32.768
Layer2: Reshape to 32x32x32 Result: 32x32x32
Layer3: Convolution Transposed with 32 filters, kernelsize 3, strides of 2, activation function Relu Result: 64x64x32
Layer4: Convolution Transposed with 32 filters, kernelsize 3, strides of 2, activation function Relu Result: 128x128x32
Layer5: Convolution Transposed with 32 filters, kernelsize 3, strides of 2, activation function Relu Result: 256x256x32
Layer6: Convolution Transposed with 3 filters, kernelsize 3, strides of 1, no activation function Result: 256x256x3
Zum vergleich ist hier die Architektur von dem CVAE für MNIST der gut nachstellbar funktioniert
ENCODER
Input: 28x28x1
Layer 1: Convolution with 32 filters, kernelsize 3, strides of 2, activation fuction Relu Result: 14x14x32
Layer 2: Convolution with 64 filters, kernelsize 3, strides of 2, activation fuction Relu Result: 7x7x64
Layer 3: Flatten Result: 3.136
Layer 4: Dense to 50 with no activation function Result 50
Die Hälfte (25) von Layer4 wird als Mittelwert und die andere Hälfte als Standardabweichung genommen und damit aus der Normalverteilung 128 Werte gesampled: Result 25
DECODER
Input: 25
Layer1: Dense to 7732 with activation function Relu Result: 7732 = 1 568
Layer2: Reshape to 7x7x32 Result: 7x7x32
Layer3: Convolution Transposed with 64 filters, kernelsize 3, strides of 2, activation function Relu Result: 14x14x64
Layer4: Convolution Transposed with 32 filters, kernelsize 3, strides of 2, activation function Relu Result: 28x28x32
Layer5: Convolution Transposed with 1 filter, kernelsize 3, strides of 1, no activation function Result: 28x28x1
I need much more information about the architecture. I think it can be meaningful, f we have always a short table which contains the successively applied layer (convolutional (and strides), (un)pooling layers, ReLU functions, skip connections as well as input size of images and resulting feature maps). Can you create a document for the latest architecture, which describe this sequence. With a current description we may discuss with matthias a little bit better.