decoding-comp-trust / comp-trust

Codebase for decoding compressed trust.
MIT License
20 stars 0 forks source link

Decoding Compressed Trust

Codebase for the Decoding Compressed Trust.

Model Preparation

We provide compressed models at huggingface. Details for compressing models are provided here.

Pruning

Our code is based on git@github.com:locuslab/wanda.git.

cd compression
git clone git@github.com:locuslab/wanda.git

Pruning Magnitude/SparseGPT/Wanda with semi-structured sparsity:

cd wanda
CUDA_VISIBLE_DEVICES=0 python main.py --model meta-llama/Llama-2-13b-chat-hf --prune_method magnitude --sparsity_type 2:4 --sparsity_ratio 0.5 --save=output/llama-2-13b-chat_mag_2to4
CUDA_VISIBLE_DEVICES=0 python main.py --model meta-llama/Llama-2-13b-chat-hf --prune_method sparsegpt --sparsity_type 2:4 --sparsity_ratio 0.5 --save=output/llama-2-13b-chat_sparsegpt_2to4
CUDA_VISIBLE_DEVICES=2 python main.py --model meta-llama/Llama-2-13b-chat-hf --prune_method wanda --sparsity_type 2:4 --sparsity_ratio 0.5 --save=output/llama-2-13b-chat_wanda_2to4

Change meta-llama/Llama-2-13b-chat-hf to other models upon demands.

Quantization

GPTQ:

pip install auto-gptq
cd compression/gptq

CUDA_VISIBLE_DEVICES=0 python gptq.py --pretrained_model_dir meta-llama/Llama-2-13b-chat-hf --quantized_model_dir ./output --bits 4 --save_and_reload --desc_act --seed 0 --num_samples 128 --calibration-template llama-2

AWQ:

cd compression
git clone https://github.com/mit-han-lab/llm-awq
cd llm-awq

mkdir -p /storage/jinhaoduan/workspace/llm-awq-main/experiments/llama-2-13b-chat-bit4-seed0
CUDA_VISIBLE_DEVICES=1 python -m awq.entry --model_path meta-llama/Llama-2-13b-chat-hf --seed 0 --w_bit 4 --q_group_size 128 --run_awq --dump_awq awq_cache/llama-2-13b-chat-bit4-seed0.pt
CUDA_VISIBLE_DEVICES=1 python -m awq.entry --model_path meta-llama/Llama-2-13b-chat-hf --tasks wikitext --w_bit 4 --q_group_size 128 --load_awq awq_cache/llama-2-13b-chat-bit4-seed0.pt --q_backend fake --dump_awq_weights_to_hf ./llm-awq-main/llama-2-13b-chat-bit4-seed0

Running Experiments

Install the modified DecodingTrust following this link.

Due to the large volume of experiments, we recommend to run experiments using the Slurm job system. We provide a example of slurm config file. For each model, we provide a config file under configs/model_config.

Note these files are tuned for VITA ACES servers and may not work on other servers.

Important files

Setup

# find the gpu type
scontrol show node | grep Gres
# Add slurm
cd DecodingTrust
pip install -e ".[slurm]"

Modify dt/configs/model_configs/vicuna-13b-v1.3-mag_2to4.yaml for your model. Add vicuna-13b-v1.3-mag_2to4 to multi-run.sh

bash scripts/multi-run.sh

Aggregating Results

Upload results to github

git pull
python gather_result_files.py --result_dir=<path-to-DT-result-folder> -p=<perspective_name>
# Example
# python gather_result_files.py -p=adv-glue
git add results/
git commit -m "Update results"
git push

Example:

git pull
python gather_result_files.py -p=adv-glue
git add results/
git commit -m "Update results"
git push

Extract results to csv file (data/num_sheet.csv) which will be used for visualization. Run python extract_csv.py.