Open tb2-sy opened 1 month ago
Hi, training time should increase but definitely not that slow in our experiments. Could you please let me know the resolution, the number of K per image and your batch size?
The resolution is 1920*1080, the batch size is 1, and K is the default 20.
superized...but thanks for letting me know and I will try the code anyway..
Is this because the resolution is too high?
I don't think so since we did experiments on dynerf dataset whose resolution is also 1k or higher.
Okay, this is a bit weird.
Checking the time consumed by each part of the backward process, we found some problems. The backward process with flow loss consumes 30+ seconds. And between https://github.com/Zerg-Overmind/diff-gaussian-rasterization/blob/main/cuda_rasterizer/rasterizer_impl.cu#L419 to line 448 (Time for RasterizeGaussiansBackwardCUDA2). The time consumption is thousands of times that of https://github.com/Zerg-Overmind/diff-gaussian-rasterization/blob/main/cuda_rasterizer/backward.cu#L756 to line 783 (Time for renderCUDA). As shown in the figure above, when flow loss is involved, cuda backward will be executed twice. The difference between the two times when flow loss is involved is very large. Why is this?
30 times is crazy btw, i think there must be something wrong, forgive us not being able to debug on your server, we can only verify everything on our end. Maybe the cuda/gpu version is very different? 30 times sounds like gpu is not properly used.
Thanks for your nice work! I tried to integrate flow loss into 4D-GS, but found that the training time increased significantly when executing the loss.backward() operation, from the original one hour to the current estimated training time of 100 hours, which is almost 100 times longer. Is this reasonable?