Closed TomSuen closed 4 months ago
So my question is what is the number of nms_ks in the flow_sampler function you used in the watershed sampler? I set it to 3 to get sampling points as much as possible,but it hard to just use these points to reconstruct the ori video, is this normal?
We set nms_ks=15
during the training process. I think nms_ks=3
may be a little small to train the model.
Btw,I found that one of the possible reasons is that the mask are all taken from the first frame. If an object in the first frame does not move, it is difficult for the Watershed algorithm to sample there, resulting in a lack of guidance for this object in the sparse flow guidance sequence, so the reconstruction effect is not ideal, right?
Yes, the watershed algorithm is unable to sample points that remain stationary in the initial frame. However, this may not significantly impact the model's training, as it is unnecessary and even inadvisable to sample every moving part throughout the video.
So my question is what is the number of nms_ks in the flow_sampler function you used in the watershed sampler? I set it to 3 to get sampling points as much as possible,but it hard to just use these points to reconstruct the ori video, is this normal?
We set
nms_ks=15
during the training process. I thinknms_ks=3
may be a little small to train the model.Btw,I found that one of the possible reasons is that the mask are all taken from the first frame. If an object in the first frame does not move, it is difficult for the Watershed algorithm to sample there, resulting in a lack of guidance for this object in the sparse flow guidance sequence, so the reconstruction effect is not ideal, right?
Yes, the watershed algorithm is unable to sample points that remain stationary in the initial frame. However, this may not significantly impact the model's training, as it is unnecessary and even inadvisable to sample every moving part throughout the video.
Okay,thank you for ur reply, When will you open source the training code?
And I have another question, if I resize the (336,596) video to (256,256) to predict the optical flow, and if I predict the optical flow with the original size video and then resize it to (256,256), will there be any difference between the two optical flow graphs? Normally, unimatch should not be too sensitive to size.
And I have another question, if I resize the (336,596) video to (256,256) to predict the optical flow, and if I predict the optical flow with the original size video and then resize it to (256,256), will there be any difference between the two optical flow graphs? Normally, unimatch should not be too sensitive to size.
We found that Unimatch is actually (relatively) sensitive to input size. Unimatch produces sharper predictions when resizing the input frames to its training size: [384, 512], and we adopted this setting during the training process.
Thx again, forgive me for having so many questions. I noticed that the pre-trained models you provided are all for 25 frames. Can I fine-tune them on 14 frames of data?
Thx again, forgive me for having so many questions. I noticed that the pre-trained models you provided are all for 25 frames. Can I fine-tune them on 14 frames of data?
Yes, you can finetune the model on 14 frames of data. I think it will not negatively impact the performance of the model.
So my question is what is the number of nms_ks in the flow_sampler function you used in the watershed sampler? I set it to 3 to get sampling points as much as possible,but it hard to just use these points to reconstruct the ori video, is this normal?
We set
nms_ks=15
during the training process. I thinknms_ks=3
may be a little small to train the model.Btw,I found that one of the possible reasons is that the mask are all taken from the first frame. If an object in the first frame does not move, it is difficult for the Watershed algorithm to sample there, resulting in a lack of guidance for this object in the sparse flow guidance sequence, so the reconstruction effect is not ideal, right?
Yes, the watershed algorithm is unable to sample points that remain stationary in the initial frame. However, this may not significantly impact the model's training, as it is unnecessary and even inadvisable to sample every moving part throughout the video.
Okay,thank you for ur reply, When will you open source the training code?
I am starting to prepare the release of training codes since our paper has been accepted for ECCV'24. I think that the training codes will be made available within a week 🤔.
So my question is what is the number of nms_ks in the flow_sampler function you used in the watershed sampler? I set it to 3 to get sampling points as much as possible,but it hard to just use these points to reconstruct the ori video, is this normal?
We set
nms_ks=15
during the training process. I thinknms_ks=3
may be a little small to train the model.Btw,I found that one of the possible reasons is that the mask are all taken from the first frame. If an object in the first frame does not move, it is difficult for the Watershed algorithm to sample there, resulting in a lack of guidance for this object in the sparse flow guidance sequence, so the reconstruction effect is not ideal, right?
Yes, the watershed algorithm is unable to sample points that remain stationary in the initial frame. However, this may not significantly impact the model's training, as it is unnecessary and even inadvisable to sample every moving part throughout the video.
Okay,thank you for ur reply, When will you open source the training code?
I am starting to prepare the release of training codes since our paper has been accepted for ECCV'24. I think that the training codes will be made available within a week 🤔.
Waooooo, great news😄
So my question is what is the number of nms_ks in the flow_sampler function you used in the watershed sampler? I set it to 3 to get sampling points as much as possible,but it hard to just use these points to reconstruct the ori video, is this normal?
We set
nms_ks=15
during the training process. I thinknms_ks=3
may be a little small to train the model.Btw,I found that one of the possible reasons is that the mask are all taken from the first frame. If an object in the first frame does not move, it is difficult for the Watershed algorithm to sample there, resulting in a lack of guidance for this object in the sparse flow guidance sequence, so the reconstruction effect is not ideal, right?
Yes, the watershed algorithm is unable to sample points that remain stationary in the initial frame. However, this may not significantly impact the model's training, as it is unnecessary and even inadvisable to sample every moving part throughout the video.
Okay,thank you for ur reply, When will you open source the training code?
I am starting to prepare the release of training codes since our paper has been accepted for ECCV'24. I think that the training codes will be made available within a week 🤔.
Hi, I have other questions.
So my question is what is the number of nms_ks in the flow_sampler function you used in the watershed sampler? I set it to 3 to get sampling points as much as possible,but it hard to just use these points to reconstruct the ori video, is this normal?
We set
nms_ks=15
during the training process. I thinknms_ks=3
may be a little small to train the model.Btw,I found that one of the possible reasons is that the mask are all taken from the first frame. If an object in the first frame does not move, it is difficult for the Watershed algorithm to sample there, resulting in a lack of guidance for this object in the sparse flow guidance sequence, so the reconstruction effect is not ideal, right?
Yes, the watershed algorithm is unable to sample points that remain stationary in the initial frame. However, this may not significantly impact the model's training, as it is unnecessary and even inadvisable to sample every moving part throughout the video.
Okay,thank you for ur reply, When will you open source the training code?
I am starting to prepare the release of training codes since our paper has been accepted for ECCV'24. I think that the training codes will be made available within a week 🤔.
Hi, I have other questions.
- For the first frame of optical flow, will the watershed sampler algorithm definitely sample the position of the maximum optical flow value?
- When training, do you want the sparse optical flow obtained after cmp to be exactly the same as the dense optical flow extracted from the original video? I mean, training is just to enable the model to learn the guiding role of any optical flow, not to completely restore the video, right?
Sorry for the late reply, busy days.
For the first frame of optical flow, will the watershed sampler algorithm definitely sample the position of the maximum optical flow value?
Actually I am not very sure about the answer to this question, I think yes, will the watershed sampler algorithm definitely sample the position of the maximum optical flow value. You can check the codes now since I have just already released the training codes.
When training, do you want the sparse optical flow obtained after cmp to be exactly the same as the dense optical flow extracted from the original video?
No. This is the last thing I expect😂. This will make the model depend on flow too much and lack enough generation ability.
I mean, training is just to enable the model to learn the guiding role of any optical flow, not to completely restore the video, right?
Yes, our goal is to use a rough flow as guidance, that is to say, given an in accurate optical flow from CMP, the model can still generate semantically meaningful videos that correctly reflect the intention of the user.
By the way, I have just released the training codes, you can check for details.
Best regards,
Thank you very much for your kind reply! I feel I have almost fully understood your work.
Hi, thank you for such a wonderful work!I would like to ask a question about the preparation of training sets. Notice that you mentioned in the paper
During training, we randomly sample 14 video frames with a stride of 4. ...with a resolution of 256 × 256. We first train ... and directly taking the first frame together with the estimated optical flow from Unimatch.
So my question is what is the number of
nms_ks
in the flow_sampler function you used in the watershed sampler? I set it to 3 to get sampling points as much as possible,but it hard to just use these points to reconstruct the ori video, is this normal?Btw,I found that one of the possible reasons is that the mask are all taken from the first frame. If an object in the first frame does not move, it is difficult for the Watershed algorithm to sample there, resulting in a lack of guidance for this object in the sparse flow guidance sequence, so the reconstruction effect is not ideal, right?