daerduoCarey / structurenet

StructureNet: Hierarchical Graph Networks for 3D Shape Generation
https://cs.stanford.edu/~kaichun/structurenet/
Other
258 stars 45 forks source link

Long training time + initial increase in KL divergence #6

Closed djr2015 closed 3 years ago

djr2015 commented 4 years ago

Thank you for your work and accompanying codebase!

1) I am able to run scripts/box_vae_chair.sh, but I am finding it will take far longer (~1h per epoch ~= 8.3 days) to get to 200 training epochs than the 1-2 days your paper mentioned for bounding box inputs using:

2) For my training (first 10 epochs) thus far KL divergence is increasing, I wondered if this was expected behavior? KLdiv

daerduoCarey commented 4 years ago
  1. StructureNet indeed needs longer training time, as it is mostly using CPU for training (as it's hard to batch the operations). Make sure you have a decent CPU (at least i7/i9). But it should be done (getting reasonable progress that close to convergence) after 1-3 days of training.
  2. KLDiv will go up first, and then perform as a regularizor in the later training.
Warren-swr commented 1 year ago

Hello, I also encountered the problem of too long training time. What can I do to speed up the training? I noticed that torch.set_num_threads() can be set in the training script. If I have a multi-core CPU, such as 20 cores, can I speed up training by using more CPU threads? I noticed your comment, that don't use too many CPU threads, But does this also affect CPUs with more cores?