Jikai0Wang / OPT-Tree

OPT-Tree: Speculative Decoding with Adaptive Draft Tree Structure
18 stars 1 forks source link

Does not support multiple GPUs yet? #1

Closed chenwenyan closed 3 months ago

chenwenyan commented 4 months ago

Hi, I find the readme said that this system can support multiple GPUs. However, when I use export CUDA_VISIBLE_DEVICES=2,3 python -m evaluation.eval_opt_classic \ --draft-model-path JackFram/llama-68m \ --base-model-path meta-llama/Llama-2-7b-chat-hf \ --bench-name mt_bench \ --answer-file ./mt_classic_opt.jsonl \ --temperature 0 \ --nodes 10 \ --threshold 0.5 \ --max_depth 5 to run the script, there is the error output:

Output to ./mt_classic_opt.jsonl target model: meta-llama/Llama-2-7b-chat-hf temperature: 0.0 Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:02<00:00, 1.47s/it] Check model training state: False CUDA VISIBLE DEVICES: 2,3 Traceback (most recent call last): File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/root/workspace/OPT-Tree/evaluation/eval_opt_classic.py", line 489, in <module> run_eval( File "/root/workspace/OPT-Tree/evaluation/eval_opt_classic.py", line 138, in run_eval get_answers_func( File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/root/workspace/OPT-Tree/evaluation/eval_opt_classic.py", line 227, in get_model_answers output_ids, new_token, idx, accept_length = spforward( File "/root/workspace/OPT-Tree/evaluation/eval_opt_classic.py", line 44, in spforward draft_input_ids, draft_position_ids, tree_attention_mask, last_token, parent = model(input_ids, File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(*args, **kwargs) File "/root/workspace/OPT-Tree/opt_classic/model.py", line 106, in forward input_ids, position_ids, tree_attention_mask,parent=self.draft(input_ids,nodes,threshold,max_depth) File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/root/workspace/OPT-Tree/opt_classic/model.py", line 138, in draft draft_outputs = self.draft_model.model( File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(*args, **kwargs) File "/root/workspace/OPT-Tree/opt_classic/modeling_llama_kv.py", line 1006, in forward inputs_embeds = self.embed_tokens(input_ids) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 163, in forward return F.embedding( File "/opt/conda/lib/python3.10/site-packages/torch/nn/functional.py", line 2264, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select)

Do you have any suggestions about this?

yisunlp commented 4 months ago

The auto_device_map of our code may vary on different machines. You may need to manually modify the device of the input_ids. For example, you could try: change inputs_embeds = self.embed_tokens(input_ids) to inputs_embeds = self.embed_tokens(input_ids.to(self.embed_tokens.device))

yisunlp commented 4 months ago

This is mainly because there are some omissions in the device conversion checks in our code. We will re-examine our code and will release an optimized version of the code later.

chenwenyan commented 3 months ago

This is mainly because there are some omissions in the device conversion checks in our code. We will re-examine our code and will release an optimized version of the code later.

Thanks a lot for your kind reply. I have solved this by inputs_embeds = self.embed_tokens(input_ids.to(next(iter(self.embed_tokens.parameters())).device))

But there are some other errors occur, I will debug them later. Thank you!