Open budgetdevv opened 6 days ago
Hey, the short anwser is: There is no real use to this, don't use it.
The longer version of that is, that this method loads the weights of the model quantized in the given format. (It quantizes on the fly when loading without it beeing saved in this format - this was required in the beginning of stable-diffusion.cpp, when saving quantized models wasn't possible). But due to the long time needed to load a model with this setting, I don't really see any use case where this is better than just converting it before.
Hey thanks for the prompt response! I was wondering why it took so long
What is the purpose of this API? Do I need to use it when running a quantized GGUF model? Thanks