SJTU-IPADS / PowerInfer

High-speed Large Language Model Serving on PCs with Consumer-grade GPUs
MIT License
7.88k stars 403 forks source link

Can we make it run on other models? #83

Open YLSnowy opened 8 months ago

YLSnowy commented 8 months ago

Can we make it run on other models? Can the offload code be made public?

hodlen commented 8 months ago

Thank you for your interest! We are actively working to expand our range of supported models. However, there are certain limitations, as detailed in our FAQs. Please keep an eye out for updates on new model integrations!

Regarding the "offload code," could you please clarify which aspect you're referring to? If it's about neuron offloading, our implementation is fully open-sourced and available in this repository for your review and use.

YLSnowy commented 8 months ago

Thank you for your answering. So how to profile for a model ? I haven't seen source code about it.

hodlen commented 8 months ago

We are still organizing the code for profiling and training the predictor and will release them once they are ready. Please stay tuned for our progess in #93.

As for now, you can check the reference inplementaion in Deja Vu and refer to these related issues: #84, #54 .

drewskidang commented 8 months ago

question we can covert our own fine tunned llama models??

hodlen commented 8 months ago

question we can covert our own fine tunned llama models??

Please kindly refer to #17, #34, #56, #82, #94 and more discussion in previous issues.

linkerlin commented 8 months ago

I believe that a statistical method could be employed to set all outputs of non-ReLU activation functions that are below, for instance, the 30th percentile to zero, in a similar manner to obtain sparsity guarantees akin to those provided by ReLU.