DifferentiableUniverseInitiative / IDRIS-hackathon

Repository for hosting material and discussions for the 2021 IDRIS GPU hackathon
MIT License
2 stars 0 forks source link

Implementation of Horovod backend in Mesh TensorFlow #3

Open EiffL opened 3 years ago

EiffL commented 3 years ago

This issue is to track the developments needed to finalize and validate the Mesh TensorFlow implementation relying on horovod for the backend. This overarching goal will encapsulate several smaller issues.

Goal

By the end of the hackweek, submit a Pull Request to https://github.com/tensorflow/mesh with our new implemenation for GPU clusters

Participants

The main participants to this task are:

Tasks

Progress made on these subtasks can be reported here.

EiffL commented 3 years ago

And we have identified another issue here, I'm adding it tot the list of things we need to resolve: https://github.com/DifferentiableUniverseInitiative/mesh/issues/4

EiffL commented 3 years ago

We have managed to mostly solve this the two first points of this issue, by the following: