YueWuHKUST / CVPR2020-FutureVideoSynthesis

56 stars 4 forks source link

Instance and Semantic preprocess #2

Closed vmelan closed 3 years ago

vmelan commented 3 years ago

Thank you for the amazing work!

I tried to replicate the results using your published Github code however I encounter many issues due to the complexity of your paper:

More information to process the semantic and instance segmentation would be helpful.

So I tried to have image_dir=semantic_dir=instance_dir to see if the code is running. PWC-Net runs fine however at pred_dynamic = modelG.inference(input_image, input_semantic, input_flow, input_conf, input_instance) I have the following error: File "test.py", line 199, in <module> test() File "test.py", line 176, in test modelG.inference(input_image, input_semantic, input_flow, input_conf, input_instance) File "/home/research/FutureVideoSynthesis/dynamic/models/dynamic_detect.py", line 106, in infe rence = self.netG0.forward(self.loadSize, image_reshaped, semantic_reshaped, flow_reshaped, conf_reshape d, edge_reshaped) File "/home/research/FutureVideoSynthesis/dynamic/models/networks.py", line 236, in forward down1 = self.model_down_input(input) File "/home/anaconda3/envs/futvidsyn/lib/python3.5/site-packages/torch/nn/modules/module.py", line 489, in __call__ result = self.forward(*input, **kwargs) File "/home/anaconda3/envs/futvidsyn/lib/python3.5/site-packages/torch/nn/modules/container.py ", line 92, in forward input = module(input) File "/home/anaconda3/envs/futvidsyn/lib/python3.5/site-packages/torch/nn/modules/module.py", line 489, in __call__ result = self.forward(*input, **kwargs) File "/home/anaconda3/envs/futvidsyn/lib/python3.5/site-packages/torch/nn/modules/conv.py", li ne 320, in forward self.padding, self.dilation, self.groups) RuntimeError: Given groups=1, weight of size [128, 101, 7, 7], expected input[3, 87, 262, 518] to have 101 channels, but got 87 channels instead When trying to generate the rigid masks.

YueWuHKUST commented 3 years ago

I add a document about the data preparation. https://github.com/YueWuHKUST/FutureVideoSynthesis/blob/main/doc/Data_preparation.md

The "gray" represents the single-channel output of semantic segmentation. Usually semantic segmentation model will give a single channel output, for example, in Cityscapes, a semantic map whose value is in 1~19(number of classes). And a visualized semantic map in RGB format.

The transformed means resized to 256x832 in Kitti