Open HaoDot opened 2 years ago
Hi Ginobili-20,
Thanks for your interest in our work.
Sorry for not replying immediately. Thx for your explanation! However, there is another remaining question. Hope you can reply to it again, thx. There are three loss functions in EVDI, Blurry-event loss, Blurry-sharp loss, and Sharp-event loss, respectively. Besides, from Table 3, it seems that Blurry-sharp loss plays an important role in supervision, others even can't help the model converge. For now, what is mentioned above is all from the original paper. But the fact is that Blurry-sharp loss can only play its role in the deblurring task, which means recovering the latent frame during the exposure time. So, B-S loss can not help the model to converge during the interpolation task, which is designed to recover the latent frames, not in exposure time. What's more, B-E loss will have a trivial solution, when is equal to , and is equal to . And, S-E loss will be influenced by the noise in event streams. As shown in Table 3, all losses mentioned above can't work well. So, chances are that the model can not converge well. However, EVDI still has a strong performance in the interpolation task. I can't understand the fact that finetuning with B-E loss and S-E loss can achieve such a good result. I wonder if there are other training strategies that I have missed. To sum up, the existing problem for me is how to supervise EVDI in the interpolation task. Waiting for your reply! Thx. P.S. EVDI is still a brilliant work, which makes an impression on me!
Thanks for your question.
Losses: As stated in Sec. 5.4 of our paper, B-S (Blurry-sharp) loss contributes to brightness consistency, while B-E and S-E losses are designed to handle motion ambiguity. They all play important roles in our EVDI since we aim to recover sharp results (related to motion ambiguity) with correct brightness (related to brightness consistency). Although it seems that B-S loss achieves the best quantitative results compared with B-E and S-E losses in Tab. 3, it is because the metrics highly depend on pixel brightness, and thus they cannot tell the whole story. For instance, the qualitative results in Fig. 6 show that B-S loss ensures correct brightness but cannot provide sharp results like B-E loss. In fact, models with B-E and S-E losses also converge as the figure shown below, where the x-axis and y-axis indicate training epoch and normalized loss value, respectively. For the trivial solution of B-E loss, E(f,T_i)=B_i might occur if LDI networks take blurry frames as inputs and learn an identical mapping. But in our case, for a constant blurry frame B_i, different chosen timestamp t leads to different input events to LDI and thus different E(f,T_i), which potentially avoids the trivial solution.
Interpolation: As discussed above, motion ambiguity can be handled in both interpolation and deblurring (B-E and S-E losses). For brightness consistency, we train interpolation together with deblurring and use the same EVDI model to fulfill both tasks. Thus the constraint on brightness consistency is also valid in the interpolated frames.
Admittedly, EVDI is not perfect and there are some limitations in it such as the noise issue in S-E loss, but we hope EVDI could inspire more exciting works in the related field. Thanks.
Thanks for replying in detail again.
num_leftB
and num_rightB
. Combining your explanation and the code below, I know how to select recovered outputs during the exposure time to synthesize blurry.
https://github.com/XiangZ-0/EVDI/blob/a9a22ce4f671aa158bb8d2c6bbcb4325c07016e6/codes/Loss.py#L14
Finally, Thx for your detailed answers again!
Hi , a nice work for event-based video unfolding! You have considered a practical setting, where the duty-cycle is not 1. However, there are still a few questions for me.
the number of output image
It seems like the final conv in EDVI has one channel for output, which means it synthesize one image every time. Your paper confirms that too! However, the blurry-sharp loss in EDVI needs M reconstructions. Besides, M has to be big enough. I don't know how to understand it.
the setting of section 5.2 and 5.3
EDVI is designed to fulfill motion deblur and interpolation at the same time. But you use it for different setting, my understanding for it is as followed: For the deblurring in section 5.2, the time f of latent frame to recover is inside the exposure time or . So, for the interpolation in section 5.3, EDVI trys to recover the intermediate frame between exposure times. I don't know whether I understand it right. Hope you can help me to understand the issues above. Thx a lot.