[WIP] Depth-Aware Domain Adaptation (DADA)

Use approach from DADA paper to improve segmentation head, and thereby the masker.

Summary of changes:

Feature fusion: features obtained from depth decoder, just before average pooling, are now passed through a small decoder to obtain an output of the same size as latent vector z. We then compute the element-wise product of latent vector z and depth_features. The resulting vector is given as input to the segmentation decoder.
DADA fusion: the weighted self-information map of segmentation output is now multiplied by the depth output before being passed through the ADVENT discriminator of segmentation head. (Idea is that now, closer objects are given more attention).

cc-ai / climategan