Open hyunwoongko opened 2 years ago
@sdtblck I saw you posted an issue regarding OSLO PP. Is there anything in PP you would like to improve?
Given that we are very far from the mainline DeepSpeed repo, would #1 involve a lot of unnecessary labor compared to doing it after we get back to the main version of DeepSpeed?
@StellaAthena
In addition, If there are any further parts that you would like to improve or experiment even if it has nothing to do with OSLO, please feel free to assign some tasks to me. I will totally help the neox project.
@hyunwoongko Ah I think I misread your comments about #1 :) In that case I would certainly be interested in experimenting with it :)
Honestly, far and away the most helpful thing you could do is figure out how to bring us back in-line with the main DeepSpeed branch. I know that’s a big ask though, so no worries if it’s a bit daunting.
In terms of building out the library, the other most important things on the horizon are #479 and #215. There’s also some outstanding abandoned PRs with optimizers like Shampoo that would be nice to have cleaned up and finished. In terms of general library maintenance, #469 and various documentation improvements such as #506 #484 and #458 would all be quite helpful.
We could also always use help designing and orchestrating experiments. We can happily provide the compute for anyone willing to do the work… DM me on Slack if you’re interested.
@hyunwoongko -- Would you like to restart this effort?
@Quentin-Anthony sounds great.
AOTAutograd is a novel engine provided by functorch that can fuse all parts of a neural network. I added it to OSLO recently, and this makes training very faster. I want to add this to GPTNeoX, how about this? It would be nice to implement this on the DeeperSpeed side as well.
OSLO changed megatron's MPU to have an odd number of embedding sizes. Therefore, there is no need to add meaningless padding tokens and this could increase memory efficiency, and by using this, I was able to implement the TP Automerging function as well. Note that this can merge 70+ architectures of transformers without checkpoint conversion scripts.
Recently FusedRMSNorm is added to Apex and this has been merged into OSLO. The NeoX 20B doesn't seem to use RMSNorm, but this might be helpful.
I will continue to write the parts that I can improve.