Questions about training pipeline

tholmb commented 3 years ago

First of all, thanks for sharing the great project! I have tried to implement your mobilePydnet network but cannot reach totally to the same results compared with pre-trained model. For that reason I have several questions about the model, loss, data and training itself.

Did you initialize weights and biases by using some particular initialization strategy or did you just use the default initialization of convolution layers?
Did you use any data augmentation like flipping, rotating, random cropping or blurring?
You told here in the issues section that your range of input and output images are [0,255]. Does it mean that in the training when you load input image and ground truth as float32, you don't normalize them for example by dividing 255 to range [0,1]?
The loss is described in the paper as: $L(D_{x}^{s}, D_{gt}) = \alpha_l \left \| (D_{x}^{s} - D_{gt})\right \| + \alpha_s L_g(D_{x}^{s}, D_{gt})$ , where $\alpha_l$ is fixed to 1 and $\alpha_s$ goes from 0.5, 0.25, 0.125 (if I understood correctly you just used 3 different scales). Here is the python code for calculating the loss but I'm not sure if I am missing something:

tholmb commented 3 years ago

I found the answer to the question 1. from the provided codes. So, convolutional kernels are initialized with xavier initialization and biases with trundated normal initialization (mean 0.0 and std 1.0).

# Conv2D
weights = tf.get_variable(
    "weights",
    kernel_shape,
    initializer=tf.contrib.layers.xavier_initializer(),
    dtype=tf.float32,
)
biases = tf.get_variable(
    "biases",
    bias_shape,
    initializer=tf.truncated_normal_initializer(),
    dtype=tf.float32,

However I have a new question about the network architecture.

In the original Pydnet get_disp extracts depth map by means of a sigmoid operator but in your network sigmoids are replaced by convolutions which outputs 1 channel. Does it really goes like this?

sieme97 commented 3 years ago

Hi, how did your training go?

FilippoAleotti / mobilePydnet

Questions about training pipeline #30