Closed drunkcoding closed 5 months ago
Looks quite good to me. Thanks.
For the class name, I am thinking of if we could name it LLM
? The current name means we only support MoE.
Is there a use case for our project to be used for GPT-like models (simply using our better system implementation for offloading)?
Description
We propose a class
MoE
as the entry point. It loads a (potentially sharded) checkpoint inside a model, sending weights to a given device as they are loaded and adds the various hooks that will make this model run properly (even if split across devices).The class has an additional
generate
member function to overwrite the default generate and adds tracing capability. It has the same behaviour as HuggingFacemodel.generate
.Usage examples