Quantization? - Githubissues

ndrean commented 2 months ago

Does someone know if this can be used for our models. It seems that the coefficients can be turned into integers. Can we do it once, save it and use this new model into the codebase, potentially significantly lowering the size, memory impact, and thus loading speed?

Extract from the blog:

get_quantized_phi = fn ->
  {:ok, %{params: model_state, model: model} = model_info} =
    Bumblebee.load_model({:hf, "microsoft/Phi-3-mini-4k-instruct"})

  IO.inspect(model_state, label: "Unquantized")
  {quantized_model, quantized_model_state} = Axon.Quantization.quantize(model, model_state)
  IO.inspect(quantized_model_state, label: "Quantized")
  %{model_info | model: quantized_model, params: quantized_model_state}
end

quantized_model = get_quantized_phi.()

:ok

LuchoTurtle commented 2 months ago

This seems like a really great feature, it seems really useful, especially when deploying to fly.io 👀

ndrean commented 2 months ago

https://elixirforum.com/t/how-to-save-quantitized-model/65910

ndrean commented 2 months ago

So Sean Moriarity responded "be patient":

What I am saying is you cannot currently save the quantized model and then reload it. The current way of quantizing a model in Axon uses a custom Axon struct that would take some extra work to serialize to safetensors and then some custom code to deserialize from a safetensors file. This will be better when we have full quantization support in Nx, because then we can support loading of generic quantized types from safetensors and other pretrained HF models

Note that the "right" channel for these questions is the EEF Slack (Erlang Ecosystem Foundation) and not Elixirforum nor the Elixir Slack. You can't invent .

ndrean commented 2 months ago

Safetensors ? 🤔 Huggingface is my friend:

dwyl / image-classifier

Quantization? #146