Closed chenwenyan closed 3 months ago
The auto_device_map of our code may vary on different machines. You may need to manually modify the device of the input_ids. For example, you could try: change inputs_embeds = self.embed_tokens(input_ids) to inputs_embeds = self.embed_tokens(input_ids.to(self.embed_tokens.device))
This is mainly because there are some omissions in the device conversion checks in our code. We will re-examine our code and will release an optimized version of the code later.
This is mainly because there are some omissions in the device conversion checks in our code. We will re-examine our code and will release an optimized version of the code later.
Thanks a lot for your kind reply. I have solved this by inputs_embeds = self.embed_tokens(input_ids.to(next(iter(self.embed_tokens.parameters())).device))
But there are some other errors occur, I will debug them later. Thank you!
Hi, I find the readme said that this system can support multiple GPUs. However, when I use
export CUDA_VISIBLE_DEVICES=2,3 python -m evaluation.eval_opt_classic \ --draft-model-path JackFram/llama-68m \ --base-model-path meta-llama/Llama-2-7b-chat-hf \ --bench-name mt_bench \ --answer-file ./mt_classic_opt.jsonl \ --temperature 0 \ --nodes 10 \ --threshold 0.5 \ --max_depth 5
to run the script, there is the error output:Output to ./mt_classic_opt.jsonl target model: meta-llama/Llama-2-7b-chat-hf temperature: 0.0 Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:02<00:00, 1.47s/it] Check model training state: False CUDA VISIBLE DEVICES: 2,3 Traceback (most recent call last): File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/root/workspace/OPT-Tree/evaluation/eval_opt_classic.py", line 489, in <module> run_eval( File "/root/workspace/OPT-Tree/evaluation/eval_opt_classic.py", line 138, in run_eval get_answers_func( File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/root/workspace/OPT-Tree/evaluation/eval_opt_classic.py", line 227, in get_model_answers output_ids, new_token, idx, accept_length = spforward( File "/root/workspace/OPT-Tree/evaluation/eval_opt_classic.py", line 44, in spforward draft_input_ids, draft_position_ids, tree_attention_mask, last_token, parent = model(input_ids, File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(*args, **kwargs) File "/root/workspace/OPT-Tree/opt_classic/model.py", line 106, in forward input_ids, position_ids, tree_attention_mask,parent=self.draft(input_ids,nodes,threshold,max_depth) File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/root/workspace/OPT-Tree/opt_classic/model.py", line 138, in draft draft_outputs = self.draft_model.model( File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(*args, **kwargs) File "/root/workspace/OPT-Tree/opt_classic/modeling_llama_kv.py", line 1006, in forward inputs_embeds = self.embed_tokens(input_ids) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 163, in forward return F.embedding( File "/opt/conda/lib/python3.10/site-packages/torch/nn/functional.py", line 2264, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select)
Do you have any suggestions about this?