Open nikhilrayaprolu opened 4 years ago
We changed the title to "Comparing an adapted U-Net Architecture for Varying Depths" and adjusted the beginning of our section:
In this section, we are analysing and comparing a U-Net like structure \cite{unet} for different depths. The architecture was derived originally from a convolutional AutoEncoder structure as used for reconstructing images (see, for an example the keras tutorial \cite{chollet2015keras}). This AutoEncoder-type architecture was modified for semantic segmentation: the provided ground truth annotations of the buildings were used as targets for training in order to accomplish the detection of buildings. Furthermore, skip-connections were introduced as found in U-net that connect encoding and decoding blocks on the same level. These connections help to recover spatial information and in our experiments this provided better reconstruction of details in images compared to post-processing, for example, using conditional random fields \cite{kraehenbuehl2012}. This architecture differs from U-net, first, with respect to the sequence inside the decoding blocks. Following the AutoEncoder approach, the decoder block mimics exactly the encoder block and consists of a single convolution followed by upsampling (max-pooling is used in the encoder block). Second, we used a single convolution of size $5 \times 5$. This architecture was used for different depth (stacked blocks of encoders and decoders).