ClementPinard / FlowNetPytorch

Pytorch implementation of FlowNet by Dosovitskiy et al.
MIT License
840 stars 205 forks source link

Does the code deal with image width and height are not multipler of 64? #15

Closed chenchr closed 6 years ago

ClementPinard commented 6 years ago

Not for the moment, but it would be easy to do so. See here for more insight

chenchr commented 6 years ago

@ClementPinard thanks for you reply. It seems that the image and flow is not processed with mean subtraction however in caffe source code of flownet2 do.

ClementPinard commented 6 years ago

This code is trying to replicate FlowNet1 results, which did not use Image normalization. I may had a normalization option later on, but caffe pretrained networks clearly expect a [0,1] input and output raw optical flow (divided by 20).

chenchr commented 6 years ago

@ClementPinard Thanks for you reply! I read the code dispflownet-release from https://lmb.informatik.uni-freiburg.de/resources/software.php, in the model flownets, in DataAugmentation layer, there is a parameter named recompute_mean. the layer below:

layer {
  name: "img1s_aug"
  type: "DataAugmentation"
  bottom: "img1s"
  top: "img1_nomean"
  augmentation_param {
    augment_during_test: true
    recompute_mean: 1000
    mean_per_pixel: false
    crop_width: $TARGET_WIDTH
    crop_height: $TARGET_HEIGHT
  }
}

In the implementation code I think this is to compute the mean in 1000 forward runs and everytime subtract it. Maybe I am wrong.. Can you tell me where you get the caffe source code of flownet? thanks!

ClementPinard commented 6 years ago

Thanks for looking into the FlowNetS code ! :) I took muy weights from the FlownetS caffe model of the v1.0, but I doubt it changed with v1.2

So as far as I understood, data augmentation layers are custom made by freiburg, and the code can be found here : https://github.com/lmb-freiburg/flownet2/blob/master/src/caffe/layers/data_augmentation_layer.cpp (initialization code) and here https://github.com/lmb-freiburg/flownet2/blob/master/src/caffe/layers/data_augmentation_layer.cu (forward code)

another intersting thing to look at are the layer parameters on the train graph :

layer {
  name: "img0s_aug"
  type: "DataAugmentation"
  bottom: "blob4"
  top: "img0_aug"
  top: "blob7"
  propagate_down: false 
  augmentation_param {
    max_multiplier: 1
    augment_during_test: false
    recompute_mean: 1000
    mean_per_pixel: false
    translate {
      rand_type: "uniform_bernoulli"
      exp: false
      mean: 0
      spread: 0.4
      prob: 1.0
    }
    rotate {
      rand_type: "uniform_bernoulli"
      exp: false
      mean: 0
      spread: 0.4
      prob: 1.0
    }
    zoom {
      rand_type: "uniform_bernoulli"
      exp: true
      mean: 0.2
      spread: 0.4
      prob: 1.0
    }
    squeeze {
      rand_type: "uniform_bernoulli"
      exp: true
      mean: 0
      spread: 0.3
      prob: 1.0
    }
    lmult_pow {
      rand_type: "uniform_bernoulli"
      exp: true
      mean: -0.2
      spread: 0.4
      prob: 1.0
    }
    lmult_mult {
      rand_type: "uniform_bernoulli"
      exp: true
      mean: 0.0
      spread: 0.4
      prob: 1.0
    }
    lmult_add {
      rand_type: "uniform_bernoulli"
      exp: false
      mean: 0
      spread: 0.03
      prob: 1.0
    }
    sat_pow {
      rand_type: "uniform_bernoulli"
      exp: true
      mean: 0
      spread: 0.4
      prob: 1.0
    }
    sat_mult {
      rand_type: "uniform_bernoulli"
      exp: true
      mean: -0.3
      spread: 0.5
      prob: 1.0
    }
    sat_add {
      rand_type: "uniform_bernoulli"
      exp: false
      mean: 0
      spread: 0.03
      prob: 1.0
    }
    col_pow {
      rand_type: "gaussian_bernoulli"
      exp: true
      mean: 0
      spread: 0.4
      prob: 1.0
    }
    col_mult {
      rand_type: "gaussian_bernoulli"
      exp: true
      mean: 0
      spread: 0.2
      prob: 1.0
    }
    col_add {
      rand_type: "gaussian_bernoulli"
      exp: false
      mean: 0
      spread: 0.02
      prob: 1.0
    }
    ladd_pow {
      rand_type: "gaussian_bernoulli"
      exp: true
      mean: 0
      spread: 0.4
      prob: 1.0
    }
    ladd_mult {
      rand_type: "gaussian_bernoulli"
      exp: true
      mean: 0.0
      spread: 0.4
      prob: 1.0
    }
    ladd_add {
      rand_type: "gaussian_bernoulli"
      exp: false
      mean: 0
      spread: 0.04
      prob: 1.0
    }
    col_rotate {
      rand_type: "uniform_bernoulli"
      exp: false
      mean: 0
      spread: 1
      prob: 1.0
    }
    crop_width: 448
    crop_height: 320
    chromatic_eigvec: 0.51
    chromatic_eigvec: 0.56
    chromatic_eigvec: 0.65
    chromatic_eigvec: 0.79
    chromatic_eigvec: 0.01
    chromatic_eigvec: -0.62
    chromatic_eigvec: 0.35
    chromatic_eigvec: -0.83
    chromatic_eigvec: 0.44
    noise {
      rand_type: "uniform_bernoulli"
      exp: false
      mean: 0.03
      spread: 0.03
      prob: 1.0
    }
  }
}

As you can see, all augmentation are shut down for test layer (which makes sense), bust mostly apparently recompute mean is used as parameters for further augmentation, I believe it is to update eigen values specified at the end of the training layer (further investigation is needed though), so the test layer will do nothing appart from computing useless mean values.

ClementPinard commented 6 years ago

Okye, I just rechecked the code, and apparently there is indeed mean substraction : https://github.com/lmb-freiburg/flownet2/blob/master/src/caffe/layers/data_augmentation_layer.cu#L593

I'm going to check how it's done on version 1.0 and maybe add it to my code. Thanks for the help !

ClementPinard commented 6 years ago

from flownet1.0 model definition :

layer {
  name: "Mean1"
  type: "Mean"
  bottom: "img0"
  top: "img0_aug"
  mean_param {
    operation: SUBTRACT
    input_scale: 0.0039216    
    value: 0.411451
    value: 0.432060
    value: 0.450141
  }

Hmmmmm well it's pretty clear that I was wrong since the beginning :smile: I'm going to try these mean values to substract and see where it goes !

chenchr commented 6 years ago

Thanks for your elaborate reply! I think we can calculate the mean offline just like vgg-net. We can just calculate the image channel mean from dataset's image and fix it and then pass to the normalize function.. I am not sure of what the recompute_mean about to do and why the origin author not to pre-calculate it. Maybe for real world data just like realtime vedio, the mean may differ in different environment so the author decided to calaulate it adaptively..

ClementPinard commented 6 years ago

I think the substract values in flownet0.1 code are mean values of the flying chairs dataset. The moving mean for further version is to adapt to other datasets such as KITTI or mpi.

For the moment you can simply add a new transforms.Normalize(mean=[0.411451,0.432060,0.450141], std=[1,1,1]) right after the first one (which was [0,0,0], [255,255,255] in the Composefunction

chenchr commented 6 years ago

Thank you very much!

chenchr commented 6 years ago

@ClementPinard Hi ClementPinard. I am using your pytorch implementation these day. However I found that the end point error(EPE) is somewhat wried.. During training. The test epe is larger than train epe. Does this situation occur to you?

ClementPinard commented 6 years ago

It's indeed weird, it only occurred to me when I completely shut down data augmentation. Otherwise every hyperparameter tested made the train perform worse than test. However I mainly tested it on flying chairs

What dataset did you use?

chenchr commented 6 years ago

I am using the flyingchairs, flownets without bn. I set a larger batch as 32 than the default setup.

chenchr commented 6 years ago

Besides, I want to ask how long did you take to train the flownet on flyingchairs from scratch roughly. I have titanx however I found during training, the gpu utilization is jump between 0% and 100%. I guess the training bottleneck is not the computing capacity but the dataloader doing on cpu instead. To date it take me 23 hours to arrive epoch 102..

ClementPinard commented 6 years ago

You have to set a high number of threads to be able to keep up with GPU. All data augmentation is done by the CPU. Also be sur to have it stored on a SSD or loading .flo files will take forever. As far as I am concerned, I have a single 980Ti gpu, and 16 batch size with 16 jobs is enough to not loose to much time loading files. With the right CPU you can even try very high number of jobs, try with -j32 for example.

ClementPinard commented 6 years ago

Closing it, as flownet now can take image that are not divisible by 64