Closed MillionthOdin16 closed 7 months ago
From the linked repo:
unstructured gradual pruning, quantization-aware training, and structural distillation
I think the model layout would be very different, and further, not comparable to llama. But definitely interesting.
This may be interesting: https://github.com/horseee/LLaMA-Pruning
Pruning: The following script globally removes 50% of the dimensions of the LLaMA-7B model, resulting in a lightweight model with 1.72B parameters.
This issue was closed because it has been inactive for 14 days since being marked as stale.
Just saw this and it seems pretty crazy. I don't know exactly where to put it, but figured is worth discussing. They claim significant performance gains and pretty crazy model compression capabilities. A lot of the interesting information is straight on the readme page that I linked.
Neural Magic Repo Link
https://github.com/mlcommons/inference_results_v3.0/blob/main/open/NeuralMagic/README.md