CERC-AAI / multimodal

An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library.
Apache License 2.0
8 stars 3 forks source link

Config changes #51

Closed daniel-z-kaplan closed 1 year ago

daniel-z-kaplan commented 1 year ago

Config changes + benchmarking again, loss = 3.86 for 12 hour run.