Closed DRXD1000 closed 6 months ago
The following command can quantize the model and save the really quantized models.
CUDA_VISIBLE_DEVICES=0 python main.py \
--model /PATH/TO/MODELS \
--epochs 20 --output_dir /PATH/TO/LOGS \
--eval_ppl --wbits 3 --abits 16 --group_size 128 --lwc \
--real_quant --save_dir /PATH/TO/SAVE
Additionally, the code only supports the really quantization for weight-only quantization. As for the weight-activation quantization, we just leverage fake quantization.
Hi, I would be very grateful i there was an tutorial how to perform weights and activation quantization on the lama-2 chat models and save the models. the code i have used so far does not seem to work and i cannot find an explanation how to replicate the results.