Open JohnnyOpcode opened 1 year ago
The process for converting a model to a SparseML compatible model doesn't seem all that complicated. Sparsity has a lot of benefits to offer for inference, while you can quantize models to the GGML format, reducing their size and complexity, whereas making a model sparse involves both quantizing and pruning irrelevant parts?
Here is a good explanation if anyone is interested.
https://neuralmagic.com/blog/sparsegpt-remove-100-billion-parameters-for-free/
Great work going on with GGML. Bravo to so many contributors. You are champions!
Maybe more performance (on CPU) can be had with bringing sparsity into the workflow. Here is one of the many efforts out there at the moment.
https://github.com/neuralmagic/deepsparse
What are peoples thoughts on this?