ECP-CANDLE / Foundation

MIT License
1 stars 0 forks source link

175B Profiling #5

Open azton opened 1 year ago

azton commented 1 year ago

Have functional implementation of deepspeed using zero3. Now need to analyze the performance and scalability--do we need pipeline/model parallelism to really run this?

azton commented 1 year ago

First trial; 32 Nodes increasing batch size until cuda OOM failures. Requires roughly 1-2 hours to run 3-5 steps for profiling.