Open ndrean opened 2 months ago
This seems like a really great feature, it seems really useful, especially when deploying to fly.io
👀
So Sean Moriarity responded "be patient":
What I am saying is you cannot currently save the quantized model and then reload it. The current way of quantizing a model in Axon uses a custom Axon struct that would take some extra work to serialize to safetensors and then some custom code to deserialize from a safetensors file. This will be better when we have full quantization support in Nx, because then we can support loading of generic quantized types from safetensors and other pretrained HF models
Note that the "right" channel for these questions is the EEF Slack (Erlang Ecosystem Foundation) and not Elixirforum nor the Elixir Slack. You can't invent .
Does someone know if this can be used for our models. It seems that the coefficients can be turned into integers. Can we do it once, save it and use this new model into the codebase, potentially significantly lowering the size, memory impact, and thus loading speed?
Extract from the blog: