Closed BinFuPKU closed 2 months ago
We only support AWQ
for multiple GPU quantization and a single GPU is enough. Additionally, please use the following eval_pos
:
eval:
eval_pos: [pretrain, transformed, fake_quant]
name: wikitext2
download: False
path: eval data path
bs: 1
inference_per_block: True
seq_len: 2048
nice,it works well, but cost too much time to quantilize Mistral-large-2 (123B)
I think evaluation costs too much time,and you can remove the eval_pos in your config for acceleration.
2024-08-14 18:22:56.727 | INFO | llmc.eval.eval_ppl:init:14 - eval_cfg : {'eval_pos': ['pretrain', 'transformed', 'fake_quant'], 'name': 'wikitext2', 'download': False, 'path': '/home/xiaoi/dq/fubin/alignment/quantization/data/evaluation/wikitext2', 'bs': 1, 'seq_len': 2048} rank0: Traceback (most recent call last): rank0: File "/home/xiaoi/dq/fubin/alignment/quantization/llmc-main/llmc/main.py", line 160, in
rank0: File "/home/xiaoi/dq/fubin/alignment/quantization/llmc-main/llmc/main.py", line 50, in main rank0: ppl = ppl_eval.eval(model) rank0: File "/opt/nlp/anaconda3/envs/dq_env_h100_llmc/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context rank0: return func(*args, **kwargs) rank0: File "/home/xiaoi/dq/fubin/alignment/quantization/llmc-main/llmc/eval/eval_ppl.py", line 74, in eval
rank0: File "/opt/nlp/anaconda3/envs/dq_env_h100_llmc/lib/python3.9/site-packages/transformers/modeling_utils.py", line 2694, in cuda rank0: return super().cuda(*args, **kwargs) rank0: File "/opt/nlp/anaconda3/envs/dq_env_h100_llmc/lib/python3.9/site-packages/torch/nn/modules/module.py", line 915, in cuda rank0: return self._apply(lambda t: t.cuda(device)) rank0: File "/opt/nlp/anaconda3/envs/dq_env_h100_llmc/lib/python3.9/site-packages/torch/nn/modules/module.py", line 779, in _apply
rank0: File "/opt/nlp/anaconda3/envs/dq_env_h100_llmc/lib/python3.9/site-packages/torch/nn/modules/module.py", line 779, in _apply
rank0: File "/opt/nlp/anaconda3/envs/dq_env_h100_llmc/lib/python3.9/site-packages/torch/nn/modules/module.py", line 779, in _apply
rank0: Previous line repeated 2 more times: File "/opt/nlp/anaconda3/envs/dq_env_h100_llmc/lib/python3.9/site-packages/torch/nn/modules/module.py", line 804, in _apply rank0: param_applied = fn(param) rank0: File "/opt/nlp/anaconda3/envs/dq_env_h100_llmc/lib/python3.9/site-packages/torch/nn/modules/module.py", line 915, in
rank0: return self._apply(lambda t: t.cuda(device))
rank0: torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 672.00 MiB. GPU
E0814 18:23:17.575959 140250494482240 torch/distributed/elastic/multiprocessing/api.py:826] failed (exitcode: 1) local_rank: 0 (pid: 1291815) of binary: /opt/nlp/anaconda3/envs/dq_env_h100_llmc/bin/python
Traceback (most recent call last):
File "/opt/nlp/anaconda3/envs/dq_env_h100_llmc/bin/torchrun", line 8, in
sys.exit(main())
File "/opt/nlp/anaconda3/envs/dq_env_h100_llmc/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 347, in wrapper
return f(*args, **kwargs)
File "/opt/nlp/anaconda3/envs/dq_env_h100_llmc/lib/python3.9/site-packages/torch/distributed/run.py", line 879, in main
run(args)
File "/opt/nlp/anaconda3/envs/dq_env_h100_llmc/lib/python3.9/site-packages/torch/distributed/run.py", line 870, in run
elastic_launch(
File "/opt/nlp/anaconda3/envs/dq_env_h100_llmc/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 132, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/opt/nlp/anaconda3/envs/dq_env_h100_llmc/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 263, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: