The speed gain from cuda_ray is unavailable for me

chewry commented 2 years ago

Thank you for your great work, again.

In the old version of README, there are some tables which show the speed gain of the cuda_ray option. But in my experiments, I failed to achieve such gain. Specifically, the AMP (automatic mixed precision) makes rendering 2 times faster, but the density_grid, mark_untrained_grid, update_extra_state and run_cuda in renderer do "not" accelerate the rendering. update_extra_state makes slow iterations.

The GPU is V100 without RTX core. Is it the reason of my failure? I tried with another GPU (RTX 2080), but there is no gain either. Would you give me some advice? Any short hint should be very helpful! Thank you.

ashawkey commented 2 years ago

@chewry Hi, this is strange, it should provide significant acceleration. Most of my experiments are carried on a V100 too. Could you give more details, e.g., the iterations per second at test with '--cuda_ray' and without '--cuda_ray'? Which dataset are you using? Also the OS, pytorch version, CUDA version will be helpful.

chewry commented 2 years ago

[Environment] OS: Linux 18.04, Python: 3.8, PyTorch: 1.10.0, CUDA: 11.3 (CUDNN 8.2) I use a custom data set with ~900 images of size 760x540, a forward-facing scene. This data is not a completely static scene, but it is successfully covered by NeRF.

The frames per second at test (760x540) w/o '--cuda_ray': 0.6 FPS (num_steps: 64, upsample_steps: 64) with '--cuda_ray': 0.25 FPS (max_steps: 1024), 6~7 FPS (max_steps: 128) (significant acceleration)

I didn't notice this 'max_steps' option until just now. I've used the default value (1024) for training with '--cuda_ray'. In the inference step, the rendering image using smaller max_steps show degraded quality results that seem to lack ray samples.

I think we should be able to speed up without adjusting this option, is that right? (In the InstantNGP paper, it seems that the authors use the sqrt(3)/1024 step size. It's almost the same as using default max_step. If I set the same small max_steps (e.g., 128) for training and test, can I avoid this quality degradation problem?)

Please let me know if you need any comments or additional info on my situation.

ashawkey commented 2 years ago

Yes, 1024 max_steps should be able to provide enough acceleration, once the occupancy grid is pruned so that most area is empty, the averaged sampling steps per ray should be around 100 points. I doubt it is the dataset that makes this difference, can you observe acceleration on the LEGO dataset? You can also uncomment this line to see the occupancy rate, a face-forwarding scene may converge to 0.1-0.3. I'm afraid using a smaller max_steps will downgrade the performance, but if there is no better solution, you can try with it.

chewry commented 2 years ago

Thank you for your advice! As you expected, a smaller max_steps slightly downgrade the performance on my dataset. But matching the max_steps for train/inference makes more reasonable results than reducing max_steps only in the inference time.

I'll test on the LEGO dataset and share the results.

chewry commented 2 years ago

I'm sorry for the delay. The occupancy grid works fine in LEGO dataset. (~10x accelerated)

In my experiment (and on my dataset), I found that occupancy grid sampling is vulnerable to scale. In specific scale range, the occ grid sampling works and accelerates rendering. But outside of that range, the acceleration gain disappears, or it fails to converge at all. (Without the occ grid sampling, the model has learned the scene in that scales.)

I think this is reasonable because covering the the camera-viewed region with a predefined grid is easier to fail than sampling without grids. With an manual scale tuning, I can get the expected acceleration gain.

Thank you for your advice. If you have any comment, please answer me. I'll close this issue in a week.

ashawkey / torch-ngp

The speed gain from cuda_ray is unavailable for me #112