anchen1011 / toflow

TOFlow: Video Enhancement with Task-Oriented Flow
http://toflow.csail.mit.edu
MIT License
431 stars 90 forks source link

Pre-training the flow estimation network #10

Open YaoooLiang opened 6 years ago

YaoooLiang commented 6 years ago

Hi, @anchen1011 . I pre-trained the flownet on the Sintel dataset but that does not converge . The batchsize is 16 and learning rate is 0.0001, the loss is defined by calculating the l1 difference between the last sub-net's output and the ground truth. Can you share the details about pre-training the flownet?

anchen1011 commented 6 years ago

I think there should be no problem with batchsize 16 and learning rate 0.0001 setup. Would you like to share the visualized input/output/target/flow so that I can have a sense on what's preventing the network from converge?

YaoooLiang commented 6 years ago

@anchen1011 ,thank you for your reply. Images are normalized between 0 and 1 image = image.astype(np.float32) / 255, image = image[0:432, :, :] while flows are unpreprocessed. The shape of images is [16, 432, 1024, 3] and the shape of flow is [16, 432, 1024, 2]. The downsampleed 8x images and flow0 flow0 = tf.constant(np.zeros((16, 54, 128, 2)), np.float32) are concatenated tf.concat([frame1, frame2, flow0], axis=3) as the first sub_net's input, and the rest of the subnets are as similar inputs as this way.

YaoooLiang commented 6 years ago

Loss drops when training only on one batch but trains losss up and down on the entire data set.

anchen1011 commented 6 years ago

It seems like you are implementing the pertaining pipeline with TF, which could introduce many issues that are unknown to me.

I think in general to figure out the reason why it doesn't converge, you need to:

  1. Visualize the network architecture (with tensorboard)
  2. Visualize a few groups of input/output/target/flow images

I would be happy to help if you attach these images so that I can take a look.

Also, your preprocessing of images is quiet different from ours. We use defaultTrainTransform from this module

YaoooLiang commented 6 years ago

@anchen1011 ,Hi, sorry for the late reply. I visiualize a few groups of the images in images.zip. After a long period of training, training l1-loss stabilized around 0.1 and validation l1-loss is around 0.15 . But the model also had a bad performance both in training datasets and validation datasets. Can you share the details about:

1.Which way do you choose for training? end2end training or step by step ? 2.How to normalize optical flow data? 3.Input images'size is original size or be cropped to a smaller size?

anchen1011 commented 6 years ago

I think your network is learning something, which means the input/output format are good.

However, the network structure seems problematic. Each subnet should output a optical flow, which you need to both resize and double the magnitude.

For your 3 questions:

  1. First step by step, and then fine-tune with end2end. Only step by step should be able to deliver a very good result.
  2. You don't normalize optical flow data.
  3. Input image size is cropped into the network input size (if not the same).
YaoooLiang commented 6 years ago

Hi,@anchen1011 . Actually, I resize and double each subnet's output flow at the same time :flow2 = tf.image.resize_images(flow1, (flow1.shape[1] * 2, flow1.shape[2] * 2)) * 2 Then, I train each subnet one by one but failed again.Also, I check that the input images and target flow are matched.Would you give me any suggestions? Thank you a lot!