Closed cfoster0 closed 3 years ago
https://github.com/kingoflolz/mesh-transformer-jax
Use the pod orchestation code from here. Effectively, we should borrow everything and modify the transformer_shard and tfrecord_loader files.
For small (<< 1B parameter) models, it was decided that model parallelism isn't needed. Closing for now.
https://github.com/kingoflolz/mesh-transformer-jax
Use the pod orchestation code from here. Effectively, we should borrow everything and modify the transformer_shard and tfrecord_loader files.