CUHK-AIM-Group / EndoGaussian

EndoGaussian: Real-time Gaussian Splatting for Dynamic Endoscopic Scene Reconstruction
https://yifliu3.github.io/EndoGaussian/
MIT License
100 stars 5 forks source link

Trying to train with video frames in steps #26

Open lolwarmaze opened 1 month ago

lolwarmaze commented 1 month ago

Hi there! I have been trying to incorporate the EndoGaussian work in my own project where I am training a scene with left and right image - pairs and depth maps. I am trying to develop a stepwise training method where let's say that first 150 frames are loaded, then trained and then sent for rendering with a custom camera pose in real time. I want to train the next 30 frames by loading the already trained model (trained on first 150 frames) and then train the next 30 frames on it, and then render them as the next step. The pipeline keep training and rendering the next 30 frames in steps till all the available frames in the data folder are trained.

Now comes the problem I am having with this. The rendered frames when converted to the video show a slight inconsistency between the frames where model is changed. So it shows the first 150 frames to have smooth Gaussian movements, then suddenly there is a small abrupt transition to the next frame and then the next 30 frames are smooth. So basically every 30 frames when the model is changed to a newer trained model, there is a noticeable jump in gaussian positions and then for the next 29 frames it will be smooth change in gaussian positions.

So my question is, how can I train the model with new timestamps, by keeping the previously trained timestamps to be unaffected? Because it seems like when training a newer timestamp, the model also adjusts how the positions of gaussians will be affected for previous timestamps. Or is there any other reason for this ? Also is there any other way to train and render stepwise if time cannot be made independent? If someone has any ideas or understanding of it then please help me. Any other suggestions will also be appreciated. Thanks

yifliu3 commented 1 month ago

Hi there! Thanks for your attention. After reading your description, it comes to me several potential factors that cause such effect:

  1. Timestamps may be not compatible. EndoGaussian normalizes each time stamp by the total video length. Therefore, for the first 150 frames, the timestamp 1 refers to the 150th frame. If more timestamps would come, you can use a large virtual video length like 300 to cover all 150 frames and new frames.
  2. The coordinate may not be aligned. If the coordinate of the first 150 frames are not aligned with the new coming frames, the skip effect can also happen.
lolwarmaze commented 1 month ago

Hi there! Thanks for your attention. After reading your description, it comes to me several potential factors that cause such effect:

  1. Timestamps may be not compatible. EndoGaussian normalizes each time stamp by the total video length. Therefore, for the first 150 frames, the timestamp 1 refers to the 150th frame. If more timestamps would come, you can use a large virtual video length like 300 to cover all 150 frames and new frames.
  2. The coordinate may not be aligned. If the coordinate of the first 150 frames are not aligned with the new coming frames, the skip effect can also happen.

Thank you for your reply!

Now regarding the timestamps, while data loading I saw that timestamps are assigned as [frame/total_frames]. So the first timestamp would be 1/150 and so on till last timestamp is 1. I changed this implementation to assign a timestamp as [frame_number*0.0001]. This way, my 151th frame gets 0.0151 time stamp while previous frame was 0.0150 time. So I am keeping the timestamps to always get an equal increment based on the frame number in the filename. Also I don't see the deformation model considering a maxtime value or a total video length value anywhere. If you know where the total-video-length information is used while training the model, then pls do let me know.

Also, what do you mean by misalignment of coordinates ? I am simply taking the same model which is first course trained and then fine trained on 150 frames. After this, I just load the data for next 30 frames and resume the training with the 30 new training cameras. How can I ensure if they have good alignment here ?

Also, I think this is what may be causing the problem. I think during the training, if a frame is being trained in an iteration, it also affects the Gaussian positions for previously trained frames as well. This is why, once I have trained the 150 frames and render them all without training more, it is very smooth. However, if I train the next 30 frames for sometime and then render them, I see those new 30 frames to have a smooth transition and also to be correct with the ground-truth, but there is a small jump in the first frame of the new 30 renders and the last frame of the previous 150 renders. This might be caused because I have rendered the first 150 frames with a different model to the one which is used to render the newer 30 frames (since it is trained a bit more). Instead of this, when I use the latest trained model (trained on 150+30=180 frames) to render all 180 frames together, they are all providing a smooth video with no jump.

So I think, that all the timestamps are being affected while training another timestamp.

yifliu3 commented 1 month ago

Hi, the timestamps can affect Hexplane, which takes a fixed resolution of the time dimension.