HannesStark / bachelorThesis

TensorFlow code and LaTex for Bachelor Thesis: Understanding Variational Autoencoders' Latent Representations of Remote Sensing Images :earth_africa:
1 stars 0 forks source link

Description of architecture #6

Open wbrandenburger opened 5 years ago

wbrandenburger commented 5 years ago

I need much more information about the architecture. I think it can be meaningful, f we have always a short table which contains the successively applied layer (convolutional (and strides), (un)pooling layers, ReLU functions, skip connections as well as input size of images and resulting feature maps). Can you create a document for the latest architecture, which describe this sequence. With a current description we may discuss with matthias a little bit better.

HannesStark commented 5 years ago

ENCODER

Input: 256x256x3

Layer 1: Convolution with 32 filters, kernelsize 3, strides of 2, activation fuction Relu Result: 128x128x32

Layer 2: Convolution with 64 filters, kernelsize 3, strides of 2, activation fuction Relu Result: 64x64x64

Layer 3: Convolution with 64 filters, kernelsize 3, strides of 2, activation fuction Relu Result: 32x32x64

Layer 4: Flatten Result: 65 536

Layer 5: Dense to 256 with no activation function Result 256

Die Hälfte (128) von Layer5 wird als Mittelwert und die andere Hälfte als Standardabweichung genommen und damit aus der Normalverteilung 128 Werte gesampled: Result 128

DECODER

Input: 128

Layer1: Dense to 32^3 with activation function Relu Result: 32^3 = 32.768

Layer2: Reshape to 32x32x32 Result: 32x32x32

Layer3: Convolution Transposed with 32 filters, kernelsize 3, strides of 2, activation function Relu Result: 64x64x32

Layer4: Convolution Transposed with 32 filters, kernelsize 3, strides of 2, activation function Relu Result: 128x128x32

Layer5: Convolution Transposed with 32 filters, kernelsize 3, strides of 2, activation function Relu Result: 256x256x32

Layer6: Convolution Transposed with 3 filters, kernelsize 3, strides of 1, no activation function Result: 256x256x3

HannesStark commented 5 years ago

Zum vergleich ist hier die Architektur von dem CVAE für MNIST der gut nachstellbar funktioniert

ENCODER

Input: 28x28x1

Layer 1: Convolution with 32 filters, kernelsize 3, strides of 2, activation fuction Relu Result: 14x14x32

Layer 2: Convolution with 64 filters, kernelsize 3, strides of 2, activation fuction Relu Result: 7x7x64

Layer 3: Flatten Result: 3.136

Layer 4: Dense to 50 with no activation function Result 50

Die Hälfte (25) von Layer4 wird als Mittelwert und die andere Hälfte als Standardabweichung genommen und damit aus der Normalverteilung 128 Werte gesampled: Result 25

DECODER

Input: 25

Layer1: Dense to 7732 with activation function Relu Result: 7732 = 1 568

Layer2: Reshape to 7x7x32 Result: 7x7x32

Layer3: Convolution Transposed with 64 filters, kernelsize 3, strides of 2, activation function Relu Result: 14x14x64

Layer4: Convolution Transposed with 32 filters, kernelsize 3, strides of 2, activation function Relu Result: 28x28x32

Layer5: Convolution Transposed with 1 filter, kernelsize 3, strides of 1, no activation function Result: 28x28x1