aicoe-kaggle / diabetic-retinopathy

0 stars 0 forks source link

Optimal PyTorch bucketing value #12

Open TreeinRandomForest opened 2 years ago

TreeinRandomForest commented 2 years ago

pytorch DDP overlaps the computation of gradients on a given batch with the communication of previous ("more forward" in the network) gradients to other nodes. See: The balance is encoded in an argument bucket_cap_mb (see:

Do a line search across bucket_cap_mb in the first few iterations to optimize wall clock time.