Some discussion on your paper

Noob-PeNguiN commented 6 days ago

Greetings.

Wonderful work! I've been reading your paper have have a few things I want to discuss.

To start with, I've noticed that in the discussion part in your paper, you mentioned $\mathrm{R}^2$ gaussian would have 'suboptimal extrapolation for other tomography tasks'. I wonder when you say 'other tomography tasks', do you mean like ultrasound tomography, MRI and etc? After reading your paper, I think we need to design rasterization methods that accurately reflect the imaging principles specific to each imaging system. This way, we can truly capture the system's actual volume during training, rather than letting the model 'cheat' to perform well in training dataset and end up overfitting in novel view synthesis during test and inference, I've tried running X-Gaussian on my MRI dataset and that just happened. Do you thing so too? But it maybe just because of the integration bias problem that you solved in your paper. Haven't run MRI dataset on your code yet though, I may try it and let you know the result.

Also, I found that in the extrapolation ability part of your paper you mention 3DGS is more local-oriented and has suboptimal extrapolation ability. It just reminds me of the method of DreamFusion which combines NeRF and diffusion. Basically, it uses a pretrained diffusion model as a scoring mechanism, which takes an image rasterized by a NeRF model (with added noise) as an input, and predict the noise, the idea is that if the image is good, the diffusion model would predict the noise well. After this process, we'll have a loss (we have ground truth of the noise since we added it ourselves) then we can backpropagate the loss to the NeRF model and train it. So I'm thinking we could do something similar here: pretrain a diffusion model as a scoring mechanism using images described with angle and other information (we can generate this dataset with some NeRF-based method, right?) , and have a similar training pipeline as DreamFusion. This may enhance the extrapolation ability of the 3DGS model. Just some thoughts, lol.

Lastly, just can't help to say that your paper is a very solid work, a lot of mathematical derivations and a large number of rigorous experiment. Really impressed by your solid math skills and coding abilities. I've been doing some similar studies and know that it's really hard to reprogram the code when changing things about resterization. I personally am still struggling with programming the backpropagation, and would really appreciate it if you can share some of your experience with me.

Really appreciate your time to read this.

Best.

Ruyi-Zha commented 5 days ago

Hi, thanks for your interest in our work.

Suboptimal extrapolation for other tomography tasks

Here, we're referring mainly to limited-angle CT, which involves estimating unseen regions outside the scanning range.

Rasterization for other imaging (MRI, ultrasound)

We believe our approach could be extended to other volumetric reconstruction tasks, like MRI and ultrasound. However, adapting rasterization for these modalities may be challenging since they do not resemble a pinhole camera model. MRI imaging relies on Fourier transforms, while ultrasound depends on echos, which means we may need to completely redesign the CUDA component.

Fortunately, there is a workaround. Our method includes a fast, differentiable voxelization module. You can extract volumes from Gaussians and then apply existing differentiable imaging modules (like Fourier transforms) to generate 2D measurements.

Accurate imaging model

I completely agree with your point. Gaussians (and NeRFs) can easily overfit 2D images, but this doesn’t guarantee an accurate 3D representation.

Regarding 3DGS and rasterization, here’s my perspective: in the context of X-rays, rasterization does not significantly simplify volume rendering (except the camera-to-ray affine approximation), so integration bias is the main issue. However, for RGB data, rasterization actually simplifies volume rendering quite a bit (see section 4.1 in EWA Splatting, and this arxiv for rasterization vs volume rendering). This causes Gaussian-based representations to deviate from true 3D models but overfit to 2D measurements. Adjusting integration bias alone cannot fully solve this issue (as explained in our paper's rebuttal). Thus, 3DGS is more suited for view synthesis rather than 3D reconstruction. While many reconstruction papers exist, they typically regularize Gaussians to be flat or non-overlapped, which I personally think are not very elegant...

DreamFusion

I consider our work as an early exploration that introduces a novel architecture for volume reconstruction (Gaussians as scene representation, rasterization for rendering, voxelization for reconstruction). Our method does not use any pretrained semantic priors, so it is predictable that it will have artifacts in extreme cases.

Our architecture is quite compatible with nerual networks such as diffusion and transformer. SDS is a good idea and you are welcome to have a try :)

Programming backpropagation

Thanks for your recognition of our work. To be honest, it is the first time that I program on CUDA, and I struggle a lot during coding. Here is my experience.

The first thing is to derive forward rasterization equations.
I follow YouTube videos to read 3DGS codes and mark those parts that may need to be edited.
I then start to edit the forward part. Since I don't know how to set up a debugging environment for CUDA, I just use printf to print out variables for debugging. I also create a one-Gaussian case for visualization (python side).
Derive backward equations. I follow 3DGS to break backward to multiple parts (cov2d->cov3D->rotation,scale...), and derive their equations respectively. Luckily they are simple algebras and most have been done by 3DGS. But I still make a 10-page document to record all formulas...
After that, edit the backward part. Debug with printf and one-Gaussian case. All done!

immortalmin commented 5 days ago

I would like to provide some additional information regarding the programming of backpropagation. You can use Pytorch to implement the same function, using the gradient automatically calculated by Pytorch to verify whether the gradient calculated by your code is correct. It's not bad to use this method when writing voxelizer, but rasterization can be a bit troublesome.

Noob-PeNguiN commented 4 days ago

Greetings.

Many thanks to your timely and detailed reply @Ruyi-Zha! And thanks for your tips @immortalmin!

the workaround

I was thinking about redesign the whole CUDA component for MRI imaging, but it's just very hard, I've been studying this for a few months and still can't do it. So I'm very interested in the workaround you mentioned, but I don't quite understand it yet. I'll study it first and discuss it with you later, as I'm just looking into your code now and I don't want to bother you with some amateur problem. I‘ve studied the code of X-Gaussian which is similar to yours, but there are still many differences. Actually, I wrote some code to "fit" my MRI data into the X-Gaussian for it to train (it's overfitting though :( ), I don't know if my code can still work in $\mathrm{R}^2$ Gaussian? The code I wrote basically do the job of ACUI in X-Gaussian for the MRI data.

3DGS is more suited for view synthesis rather than 3D reconstruction.

I agree with you. My guess is that people use 3DGS for 3D reconstruction rather than view synthesis, because it is a classic problem in CT etc, and there are many comparable methods, so the story telling of the paper might be easier. This is also a problem for me because in the MRI field the situation is the same...

SDS is a good idea and you are welcome to have a try :)

yeah, I plan to try it when I finish the basic model for MRI.

youtube video

The youtube video you recommended is really good, I would say it's one of the best tutorial in 3DGS. To be honest, I watched part of it a few weeks ago but I didn't finish it. I will watch it more seriously.

coding and debugging

Really grateful for your detailed experience!

About debugging the cuda code, I've done some research before. As python is a dynamic language, the python interpreter cannot parse static languages such as cpp and cuda, so it's really tricky to debug the code with debugger. It's possible though, you can refer to this link. But I would say, let's just use printf... By the way, can you share me some detail on how you do the one-Gaussian case for visualization on python side, many many thanks!

But I still make a 10-page document to record all formulas...

I feel you, the mathematical derivations of 3DGS is annoying... (cries from a kid bad at math). By the way, should I spend some time to read the EWA splatting paper seriously? Because it seems that many math derivations of 3DGS is actually in the EWA splatting paper.

Best.

Ruyi-Zha commented 3 days ago

Workaround

I mean use voxelization to extract a volume from Gaussian, and then use python FFT to render measuremnts. Just a thought, not verified.

One gaussian

I’ve updated the code for generating a single Gaussian, which you can find here. Below is the rendered output:

one_gaussian

X-Gaussian

Our work actually quite differs from X-Gaussian, which is designed specifically for view synthesis. I’m uncertain whether fitting MRI data aligns with our method’s capabilities or goals.

Noob-PeNguiN commented 3 days ago

Thank you for your reply!

workaroud

I think I understand what you mean now, I'll give it a try.

One Gaussian

Thanks for the code update!

Our work actually quite differs from X-Gaussian

Yes, I agree. I think we should benefit from the solved integration bias problem and the novel architecture for volume reconstruction including voxelization in your paper, rather than fitting the mri data brutally into your method.

Best.

Ruyi-Zha / r2_gaussian

Some discussion on your paper #13