ChenyangLEI / deep-video-prior

[NeurIPS 2020] Blind Video Temporal Consistency via Deep Video Prior
322 stars 39 forks source link

About the reason why IRT work #7

Closed 07hyx06 closed 3 years ago

07hyx06 commented 3 years ago

Thanks for your great work.

I still cannot understand why IRT work after reading the paper. Can you provide more insights?

ChenyangLEI commented 3 years ago

You can compare IRT with K-means where K=2.

  1. There are two outputs just like two clusters.
  2. In each iteration, the two outputs are optimized with different pixels that are closer to themseleves, which just like the clustering step. And they get the new output (just like a new center) by the pixels with their model (cluster)
07hyx06 commented 3 years ago

Thanks for your reply! What about the final state after the whole IRT training process? Would the confidence map become an all-ones mask?

07hyx06 commented 3 years ago

I wonder if the confidence map is not an all-ones mask in the final state, the main-model output would contain some pixels that belong to the minor-model.

ChenyangLEI commented 3 years ago

Yes, the confidence map finally becomes a quite stable mask, but it is not an all-one mask. The confidence map denotes the confidence between the main output and processed frames; since processed frames usually contain pixels in minor-modes, the confidence map will not be an all-one mask (if the network converges correctly).

07hyx06 commented 3 years ago

In practice, we can use a specific frame (e.g., the first frame) to train the network for the main mode at the beginning of training.

Does it mean that train the network using a specific frame in the first few iterations and then train the network on the whole frame in the video?

Last question. If the main and minor model output are all very close to the processed frame, the IRT loss should be very low, but the network would be failed to tackle the multi-model inconsistency problem. Is there also exists a when to stop training problem in the IRT process?

ChenyangLEI commented 3 years ago

We can use it or not by setting the parameter "IRT_initialization". If you want to the main mode be consistent with a specific frame, then you can use this strategy.

If main and minor model are both very close to the processed frames, then it should be similar with unimodal-inconsistency instead of multimodal inconsistency.

There is still a 'when to stop training' problem because there is still slight inconsistency within main mode or minor mode respectively.

07hyx06 commented 3 years ago

got it. thx