google-research / kubric

A data generation pipeline for creating semi-realistic synthetic multi-object videos with rich annotations such as instance segmentation masks, depth maps, and optical flow.
Apache License 2.0
2.35k stars 233 forks source link

Use the Movi-f dataset to train Raft model #251

Open aqluheng opened 2 years ago

aqluheng commented 2 years ago

Hello, I've attempted to train the Raft model using the MOVi-f dataset for reproducing the results, but the experiment results are different from those in the paper. I used the same training code as https://github.com/princeton-vl/RAFT, except that the Flying Chair was replaced with MOVi-f dataset. Would you mind sharing the settings you used to train the Raft model?

deqings commented 2 years ago

Thanks for your interest in our work. We performed the experiment using the AutoFlow training pipeline, which differs from the official PyTorch implementation of RAFT, esp. regarding data augmentation. Could you please refer to the AutoFlow codebase? https://github.com/google-research/opticalflow-autoflow (Right now the data augmentation code is available, and the full training pipeline will be available very soon.)

DQiaole commented 1 year ago

Hello, sorry to bother @aqluheng @deqings . I am also interested in MOVI-F dataset recently, but after checking the data by using flow to warp image t+1, I found there is some objects are labeled with wrong flow. Would you give any advice about this? Following figure shows the warping results, from left to right: image t, warpped image t+1 by forward flow, image t within region of interest, warpped image t+1 within region of interest, image t+1. (Please pay attention to the blue shoe) a

The codes I used are also here:

import tensorflow_datasets as tfds
def backwarp(tenInput, tenFlow):
    tenHorizontal = torch.linspace(-1.0, 1.0, tenFlow.shape[3]).view(1, 1, 1, tenFlow.shape[3]).expand(tenFlow.shape[0], -1, tenFlow.shape[2], -1)
    tenVertical = torch.linspace(-1.0, 1.0, tenFlow.shape[2]).view(1, 1, tenFlow.shape[2], 1).expand(tenFlow.shape[0], -1, -1, tenFlow.shape[3])

    coord = torch.cat([ tenHorizontal, tenVertical ], 1)

    tenFlow = torch.cat([ tenFlow[:, 0:1, :, :] / ((tenInput.shape[3] - 1.0) / 2.0), tenFlow[:, 1:2, :, :] / ((tenInput.shape[2] - 1.0) / 2.0) ], 1)

    return torch.nn.functional.grid_sample(input=tenInput, grid=(coord + tenFlow).permute(0, 2, 3, 1), mode='bilinear', padding_mode='border', align_corners=True)

ds, ds_info = tfds.load("movi_f/512x512", data_dir="./", with_info=True)
train_iter = iter(tfds.as_numpy(ds["train"]))
example = next(train_iter)
example = next(train_iter)  # the example I provided is the sencod data in train_iter
minv, maxv = example["metadata"]["forward_flow_range"]
forward_flow = example["forward_flow"][0] / 65535 * (maxv - minv) + minv
img0, img1 = example["video"][:2]
mask = np.sum(forward_flow**2, axis=-1)[:,:,None]
warpimg1 = backwarp(torch.tensor(example["video"][1][None, :, :, :]).permute(0, 3, 1, 2).float(), torch.tensor(forward_flow[None, :, :, :]).permute(0, 3, 1, 2).float())
warpimg1 = warpimg1[0].permute(1,2,0).numpy()
img = np.concatenate([img0, warpimg1, img0 * (mask>1), warpimg1 * (mask>1), img1], axis=1)

By the way, have you succeeded in replicating the results in the paper? @aqluheng

emlcpfx commented 3 months ago

Did anyone pursue this and succeed?