Vahe1994 / SpQR

Apache License 2.0
525 stars 42 forks source link

Rollback model loading to match the code from the paper #23

Closed justheuristic closed 1 year ago

justheuristic commented 1 year ago

This code fixes bad perplexity that was found with the following config

CUDA_VISIBLE_DEVICES=3 OMP_NUM_THREADS=16 MKL_NUM_THREADS=16 python main.py decapoda-research/llama-7b-hf custom --custom_data_path data/red_pajama_n=1024.pth --nsamples 128 --wbits 3 --perchannel --percdamp 1.0 --groupsize 16 --qq_scale_bits 3 --qq_zero_bits 3 --qq_groupsize 64 --outlier_threshold=0.7 --permutation_order act_order

... and with all dependency versions set by requirements.txt

p.s. kind thanks to the authors (esp. @Godofnothing @Vahe1994 ) for helping me figure out what was causing the problem

justheuristic commented 1 year ago

@Vahe1994 i'm re-running the main config now, results will be available in 40-ish minutes

Would you like me to run any additional tests to make sure this PR does not introduce more bugs?

Vahe1994 commented 1 year ago

@Vahe1994 i'm re-running the main config now, results will be available in 40-ish minutes

Would you like me to run any additional tests to make sure this PR does not introduce more bugs?

I think your experiments are sufficient.

poedator commented 1 year ago

I tried to reproduce the problem fixed here. It appeared that it was coming from omission of this code:

    if dtype == "auto":
        dtype = AutoConfig.from_pretrained(model_path).torch_dtype or "auto"  # force transformers 4.29.2 to follow the same rules as 4.30.x

which was still necessary to keep while we still tested code using transformers==4.29.2