deep-diver / LLM-As-Chatbot

LLM as a Chatbot Service
Apache License 2.0
3.3k stars 382 forks source link

[Backlog] Add sparse models to options #52

Open claysauruswrecks opened 1 year ago

claysauruswrecks commented 1 year ago

I don't know of any right now, this is just a placeholder for people to fill in if they are aware of such options.

Here is an example of a performance increase from this pruning process: https://github.com/mlcommons/inference_results_v3.0/tree/main/open/NeuralMagic

deep-diver commented 1 year ago

Can you elaborate?

claysauruswrecks commented 1 year ago

Sure, trimming involves removing nodes and connections in the network while minimizing accuracy loss. There is also an inference performance gain in both speed and hardware requirements.

Here is one such framework for pruning models, which resulted in the benchmark mentioned above: https://github.com/neuralmagic/deepsparse

Someone is bound to prune the LLaMA derivatives, and I opened this task so others might track or see it and add theirs.