Open LWShowTime opened 1 year ago
My launch : --version=Mylocalpath/LISA/ --vision_tower=Mylocalpath/CLIP-vit-large-patch14/ --precision=fp16 --load_in_4bit
I am experiencing a similar bug. Anyone knows how to fix it?
Installed CUDA version 11.8 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination Using /home/zicheng/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module transformer_inference, skipping build step... Loading extension module transformer_inference... Time to load transformer_inference op: 0.0781853199005127 seconds Traceback (most recent call last): File "/home/zicheng/Projects/LISA/chat.py", line 263, in <module> main(sys.argv[1:]) File "/home/zicheng/Projects/LISA/chat.py", line 125, in main model_engine = deepspeed.init_inference( File "/home/zicheng/miniconda3/envs/dl/lib/python3.9/site-packages/deepspeed/__init__.py", line 342, in init_inference engine = InferenceEngine(model, config=ds_inference_config) File "/home/zicheng/miniconda3/envs/dl/lib/python3.9/site-packages/deepspeed/inference/engine.py", line 141, in __init__ self._apply_injection_policy(config) File "/home/zicheng/miniconda3/envs/dl/lib/python3.9/site-packages/deepspeed/inference/engine.py", line 378, in _apply_injection_policy replace_transformer_layer(client_module, self.module, checkpoint, config, self.config) File "/home/zicheng/miniconda3/envs/dl/lib/python3.9/site-packages/deepspeed/module_inject/replace_module.py", line 313, in replace_transformer_layer replaced_module = replace_module(model=model, File "/home/zicheng/miniconda3/envs/dl/lib/python3.9/site-packages/deepspeed/module_inject/replace_module.py", line 556, in replace_module replaced_module, _ = _replace_module(model, policy, state_dict=sd) File "/home/zicheng/miniconda3/envs/dl/lib/python3.9/site-packages/deepspeed/module_inject/replace_module.py", line 631, in _replace_module _, layer_id = _replace_module(child, File "/home/zicheng/miniconda3/envs/dl/lib/python3.9/site-packages/deepspeed/module_inject/replace_module.py", line 631, in _replace_module _, layer_id = _replace_module(child, File "/home/zicheng/miniconda3/envs/dl/lib/python3.9/site-packages/deepspeed/module_inject/replace_module.py", line 631, in _replace_module _, layer_id = _replace_module(child, File "/home/zicheng/miniconda3/envs/dl/lib/python3.9/site-packages/deepspeed/module_inject/replace_module.py", line 606, in _replace_module replaced_module = policies[child.__class__][0](child, File "/home/zicheng/miniconda3/envs/dl/lib/python3.9/site-packages/deepspeed/module_inject/replace_module.py", line 290, in replace_fn new_module = replace_with_policy(child, File "/home/zicheng/miniconda3/envs/dl/lib/python3.9/site-packages/deepspeed/module_inject/replace_module.py", line 253, in replace_with_policy _container.apply_tensor_parallelism(mp_replace) File "/home/zicheng/miniconda3/envs/dl/lib/python3.9/site-packages/deepspeed/module_inject/containers/features/hybrid_engine.py", line 94, in apply_tensor_parallelism self.mlp_output_mp(mp_replace, reversed_dim=reversed_dim) File "/home/zicheng/miniconda3/envs/dl/lib/python3.9/site-packages/deepspeed/module_inject/containers/base.py", line 257, in mlp_output_mp self.module.mlp.output_w = mp_replace.copy(self.module.mlp.output_w, self._4hh_w, int8=reversed_dim) File "/home/zicheng/miniconda3/envs/dl/lib/python3.9/site-packages/deepspeed/module_inject/auto_tp.py", line 97, in copy self.merge_assert(src_shape[inner_dim], dst_shape[self.in_dim]) File "/home/zicheng/miniconda3/envs/dl/lib/python3.9/site-packages/deepspeed/module_inject/auto_tp.py", line 31, in merge_assert assert dim1 > dim2, \ AssertionError: Merging tensors is not allowed here! Please use deepspeed load_checkpoint for merging your checkpoints before replacing the transformer layer with inference-kernels
Actually, this is the bug of the deepspeed, I think you can avoid this by not using fp16. Try use bf16 ir fp32 @ZichengDuan
I only get a single 4090 and it reports out of memory when using bf16 and fp32. I tried several cuda versions and pytorch versions and driver versions for fp16 precision, and it's either an "assert dim1 > dim2" error or "out of memory" error. It never works :(
@ZichengDuan Same situation as you, so I chose a GPU with 32G VRAM, and everything goes on well LOL. But the problem of dim1 > dim2 when using 4bit launch still exists.
这个问题解决了吗,我也出现了同样的问题,
@shell-nlp Have been solved if you use a GPU with 32G VRAM.
Something wrong about the dimension in deepspeed's auto-tp.py: