PeterL1n / RobustVideoMatting

Robust Video Matting in PyTorch, TensorFlow, TensorFlow.js, ONNX, CoreML!
https://peterl1n.github.io/RobustVideoMatting/
GNU General Public License v3.0
8.32k stars 1.11k forks source link

Can GRU be replaced with Conv layers? #258

Open Stephen-K1 opened 7 months ago

Stephen-K1 commented 7 months ago
  1. In the RVM model, the GRU layer accounts for a huge number of computations. It is intuitive to ask: would it be better to replace the GRU layer with Conv layer that occupies the same number of computations? A simple answer of 'yes' or 'no' will be greatly appreciated.

  2. Recently I've been trying my best to implement a matting model with excellent performance. I have read many recently proposed video matting papers and test their matting performance. Even RVM was proposed two years ago, it is the best open-sourced (including training code) model in my test results. I wonder if you can provide some tips to improve the performance of RVM? I believe you have a lot of good ideas that are worth trying. It will be greatly appreciated if you can share some of your insights here. Thank you very much!

PeterL1n commented 6 months ago
  1. No. The whole point of our research is to replace conv with GRU. GRU recurrent architecture allows the model to analyze the video sequence with temporal memory. If you replace it with Conv, then it will treat each frame independently. It will have flickers.

  2. I have not been following matting research lately, but here are some ideas just top of my head:

    • Use transformer instead of conv gru to model temporal relation.
    • Use better backbone, based on ViT, like DinoV2.
    • Treat matting as a generative task, using diffusion objective etc.