diux-dev / cluster

train on AWS
75 stars 15 forks source link

overlapping transfer and computation in PyTorch all_reduce #14

Closed yaroslavvb closed 6 years ago

yaroslavvb commented 6 years ago

The following commit claims to overlap training and computation @bearpelican

https://github.com/pytorch/pytorch/commit/5b7951057d975a7e7ae0ea5e5a651d86b370993d

This is less than a month old, so I'm guessing it's not used in any of our experiments?

bearpelican commented 6 years ago

Wow great find! Ok I'll try to build from source again and see if we can use the latest and greatest

yaroslavvb commented 6 years ago

both c10d and Apex are in active development, maybe revisit this in a few weeks