Training speed issues. - Githubissues

Ma-Weijian commented 2 years ago

Hi Autodesk Team!

Really fantastic model and representation for deep learning on CAD. This model really sheds light on new research.

However, when I was trying to retrain your model I found that the training script is slower than expectation. It seems that the network is just two simple CNNs with a simple GNN added. But it cost me ~1.2s to train one iteration on a single RTX 3090 with the default setting provided.

I mean nothing about the entire training time of UVNet but I just wonder why the combination of these simple networks could result in such a training time. As I am quite a newbie in CAD deep learning and GNN, I really have no idea what's actually going on during the training steps.

Any idea helps. Many thanks.

Anderson Ma

Ma-Weijian commented 2 years ago

Well it seems that the training speed in later times has become far faster than the beginning.

pradeep-pyro commented 2 years ago

Interesting. We have noticed a significant slowdown on an RTX card before. I don't have access to an RTX card right now to profile, but I believe @JoeLambourne had similar issues in the past. Any thoughts/ideas Joe?

Ma-Weijian commented 2 years ago

In fact 1.2s/it happens at the first several epochs. The training speed later becomes ~20it/s.

Is it GNN's nature that the training speed at the beginning is slower than later?

pradeep-pyro commented 2 years ago

I profiled the code and couldn't find anything in particular that might slow the code down. But my GPU is not an RTX card. Can you try running either the classification or the segmentation script with the --profile advanced --max_epochs 1 arguments and pass the profiler's output to me?

pradeep-pyro commented 1 year ago

Closing for now. Please feel free to reopen if you get a chance to run the profiler.

AutodeskAILab / UV-Net

Training speed issues. #10