MyNiuuu / MOFA-Video

Official Pytorch implementation for MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model.
https://myniuuu.github.io/MOFA_Video
Other
358 stars 22 forks source link

About train dataset processing #28

Open TomSuen opened 3 days ago

TomSuen commented 3 days ago

Hi, thank you for such a wonderful work!I would like to ask a question about the preparation of training sets. Notice that you mentioned in the paper During training, we randomly sample 14 video frames with a stride of 4. ...with a resolution of 256 × 256. We first train ... and directly taking the first frame together with the estimated optical flow from Unimatch.

So my question is what is the number of nms_ks in the flow_sampler function you used in the watershed sampler? I set it to 3 to get sampling points as much as possible,but it hard to just use these points to reconstruct the ori video, is this normal?

Btw,I found that one of the possible reasons is that the mask are all taken from the first frame. If an object in the first frame does not move, it is difficult for the Watershed algorithm to sample there, resulting in a lack of guidance for this object in the sparse flow guidance sequence, so the reconstruction effect is not ideal, right?

MyNiuuu commented 3 days ago

So my question is what is the number of nms_ks in the flow_sampler function you used in the watershed sampler? I set it to 3 to get sampling points as much as possible,but it hard to just use these points to reconstruct the ori video, is this normal?

We set nms_ks=15 during the training process. I think nms_ks=3 may be a little small to train the model.

Btw,I found that one of the possible reasons is that the mask are all taken from the first frame. If an object in the first frame does not move, it is difficult for the Watershed algorithm to sample there, resulting in a lack of guidance for this object in the sparse flow guidance sequence, so the reconstruction effect is not ideal, right?

Yes, the watershed algorithm is unable to sample points that remain stationary in the initial frame. However, this may not significantly impact the model's training, as it is unnecessary and even inadvisable to sample every moving part throughout the video.

TomSuen commented 16 minutes ago

So my question is what is the number of nms_ks in the flow_sampler function you used in the watershed sampler? I set it to 3 to get sampling points as much as possible,but it hard to just use these points to reconstruct the ori video, is this normal?

We set nms_ks=15 during the training process. I think nms_ks=3 may be a little small to train the model.

Btw,I found that one of the possible reasons is that the mask are all taken from the first frame. If an object in the first frame does not move, it is difficult for the Watershed algorithm to sample there, resulting in a lack of guidance for this object in the sparse flow guidance sequence, so the reconstruction effect is not ideal, right?

Yes, the watershed algorithm is unable to sample points that remain stationary in the initial frame. However, this may not significantly impact the model's training, as it is unnecessary and even inadvisable to sample every moving part throughout the video.

Okay,thank you for ur reply, When will you open source the training code?