Closed Vahe1994 closed 1 year ago
instead of --save_quantization --save_quantization_pt "f7_4bit_quantization/"
,
can we make one argument for saving: --save_quantization "f7_4bit_quantization/"
?
Why do we need two?
Need to update README.md to give our users an example of saving/loading commands. I also suggest to create an example .py file that just loads quantized model - for use in out users' ptojects.
Quantized weight and statistics are saved in int8 as a temporary measure(because of torch).