ECP-CANDLE / Foundation

MIT License
1 stars 0 forks source link

Native deepspeed implementation #2

Closed azton closed 1 year ago

azton commented 1 year ago

As usual, lightning is nice but limited. It would be great to have a full pytorch version with deepspeed so that we can experiment with 3D parallelism. Early experiments suggest that gpt3 isn't scalable much past 32 nodes (polaris) using simply pytorch-lightning + ZeRO3 or FSDP. Perhaps scaling past this point requires pipeline/model parallism or tensor parallelism, both of which are not available via the lightning interface.

azton commented 1 year ago

Cobbled together. Testing in progress. Will open new issues if scaling sucks or issues abound.