Inference part 1/n (statistic saving)

Vahe1994 / SpQR

Apache License 2.0

525 stars 42 forks source link

Inference part 1/n (statistic saving) #32

Closed Vahe1994 closed 1 year ago

Vahe1994 commented 1 year ago

saving quantized statistics and weights for inference
loading model from quantized statistics and weights

Quantized weight and statistics are saved in int8 as a temporary measure(because of torch).

poedator commented 1 year ago

instead of --save_quantization --save_quantization_pt "f7_4bit_quantization/", can we make one argument for saving: --save_quantization "f7_4bit_quantization/" ? Why do we need two?

poedator commented 1 year ago

Need to update README.md to give our users an example of saving/loading commands. I also suggest to create an example .py file that just loads quantized model - for use in out users' ptojects.