Closed chenkang455 closed 1 month ago
@chenkang455 Thanks for your interest in our work! In fact, there are significant differences between BeNeRF and E2NeRF, and the frameworks are not the same.
Errors introduced by EDI:
In the related work section, we emphasize that E2NeRF requires a manual preset to segments event stream, followed by using EDI to accumulate each segment of the event stream into an RGB image. In cases of severe motion blur or incorrect manual parameter settings, EDI cannot generate a sharp RGB image.
Errors introduced by COLMAP:
Consequently, the RGB image generated with errors is used in COLMAP to estimate the camera pose, which of course will also contain errors.
E2NeRF cannot continue to optimize inaccurate poses during subsequent training. This two-stage process prevents the model from learning the correct radiance field.
Joint optimization
: BeNeRF does not require any preprocessing methods (like EDI or COLMAP) to initialize the camera poses, allowing for the joint optimization of the neural radiance field and camera poses, thus avoiding the issue of error accumulation in the two-stage process.
More challenging ill-posed problems
: E2NeRF requires more observational information (multi-view images and longer event streams) to address such ill-posed problems. In contrast, BeNeRF demonstrates that by correctly adopting a joint optimization approach for both poses and NeRF, it is possible to solve such ill-posed problems using just a single image (extremely sparse view) and a short event stream.
This is similar to solving equations where, instead of using more equations to find the unknowns, we can use better methods to solve for the unknowns with fewer equations. Our method is more challenging, and this is one of the key motivations behind BeNeRF.
@ethliup , thanks for your detailed response! I figure it out.
If you have more questions, feel free to contact us! This issue will be closed within three days.
Hi, @ethliup thanks for your great work. I have a little question regarding the paper's motivation. In this paper, it states that "recover the neural radiance fields (NeRF) from a single blurry image and its corresponding event stream." and "reach same performance as E2NeRF [43], which targets for the same problem, but with multi-view images and longer event data.".
But I don't know what is your efforts on achieving this target while the framework remains largely the same as the previous E2NeRF. Thx a lot.