Open warriorsniu opened 2 months ago
Hi, we noticed that S3DIS exists in two huge scenes in Area_2, which caused a large amount of data time. Maybe splitting the two scenes into multiple splits is a good solution.
Thanks for your reply. However, as shown in the figure, the most time-consuming step is the backward step (stuck for nearly one minute). What could be the reason for that?
I run the semseg-pt-v3m1-0-rpe experiment on the S3DIS dataset with batchsize 4 and 4 3090 24GB gpus. And the num_worker is set to 12. The training batch time is relatively stable and fast at the beginning. However, it could be stuck for a few seconds, sometimes 40 seconds and even more. I also checked the time cost in different steps in run_step() and I found that the backward step could be the bottleneck. Is it common? Would you please give me some suggestions about the problem? Thanks a lot!
![image](https://github.com/Pointcept/PointTransformerV3/assets/58773198/6ddff370-7ab4-4f22-b569-a6ab184f94b7)