MCG-NJU / EMA-VFI

[CVPR 2023] Extracting Motion and Appearance via Inter-Frame Attention for Efficient Video Frame Interpolatio
Apache License 2.0
358 stars 41 forks source link

The input mask of unet is that before sigmoid #14

Open hexiaoyi95 opened 1 year ago

hexiaoyi95 commented 1 year ago

https://github.com/MCG-NJU/EMA-VFI/blob/75b6f6a889e695df875e103374040d47a4cfac7c/model/flow_estimation.py#L137C65-L137C65 this is a bug or not?

GuozhenZhang1999 commented 1 year ago

Thanks for the question. We think input features of unet stay on the same scale will help network learning, but not verify that. You can try to see the impact on performance.

hexiaoyi95 commented 1 year ago

Thanks for your reply. Acctually I'm training your model with my own dataset from scratch. And find it's very hard to converge. Did you ever meet similar issues? Any experiences you can share?

GuozhenZhang1999 commented 1 year ago

You can try to adjust batchsize and learning rate, and consider training with Vimeo if the amount of data is small.