Sparsity - The next performance frontier.

ggerganov / ggml

Tensor library for machine learning

MIT License

11.17k stars 1.03k forks source link

Sparsity - The next performance frontier. #93

Open JohnnyOpcode opened 1 year ago

JohnnyOpcode commented 1 year ago

Great work going on with GGML. Bravo to so many contributors. You are champions!

Maybe more performance (on CPU) can be had with bringing sparsity into the workflow. Here is one of the many efforts out there at the moment.

https://github.com/neuralmagic/deepsparse

What are peoples thoughts on this?

DifferentialityDevelopment commented 1 year ago

The process for converting a model to a SparseML compatible model doesn't seem all that complicated. Sparsity has a lot of benefits to offer for inference, while you can quantize models to the GGML format, reducing their size and complexity, whereas making a model sparse involves both quantizing and pruning irrelevant parts?

JohnnyOpcode commented 1 year ago

Here is a good explanation if anyone is interested.

https://neuralmagic.com/blog/sparsegpt-remove-100-billion-parameters-for-free/