Closed Qessia closed 7 months ago
Hello! Thank you for reporting! We will quickly resolve this issue.
Hello!
I observe the same problem. I have tried to diagnose the issue a bit by myselve.
As I understood (if you haven't found it already) the problem is in calculating block size (its parameters). The layer_idx mentioned above is used in load_pretrained_block, but it is not used when calculating block_size and when calculating rps in throughput.
Very much waiting for a solution.
We resolved this issue in recent master update. Just pull new updates. Thank tou for noticing the issue and waiting fixes.
Thank you for your quick response!
Hi! Original error of this issue doesn't appear anymore, but I've got another error when I try launching private swarm with Mixtral (with GPU, CPU is ok). Also it doesn't appear when I do the same with StableBeluga2
python3 -m petals.cli.run_server SanjiWatsuki/TinyMixtral-32x248M --new_swarm
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/qessia/.local/lib/python3.10/site-packages/petals/cli/run_server.py", line 235, in <module>
main()
File "/home/qessia/.local/lib/python3.10/site-packages/petals/cli/run_server.py", line 219, in main
server = Server(
File "/home/qessia/.local/lib/python3.10/site-packages/petals/server/server.py", line 237, in __init__
throughput_info = get_server_throughput(
File "/home/qessia/.local/lib/python3.10/site-packages/petals/server/throughput.py", line 83, in get_server_throughput
cache[cache_key] = measure_throughput_info(
File "/home/qessia/.local/lib/python3.10/site-packages/petals/server/throughput.py", line 123, in measure_throughput_info
"inference_rps": measure_compute_rps(
File "/home/qessia/.local/lib/python3.10/site-packages/petals/server/throughput.py", line 218, in measure_compute_rps
cache = step(cache)
File "/home/qessia/.local/lib/python3.10/site-packages/petals/server/throughput.py", line 215, in step
outputs = block.forward(dummy_input, use_cache=inference, layer_past=cache_ if inference else None)
File "/home/qessia/.local/lib/python3.10/site-packages/tensor_parallel/tensor_parallel.py", line 99, in forward
return [self.module_shards[0](*args, **kwargs)][self.output_device_index]
File "/home/qessia/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/qessia/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/qessia/.local/lib/python3.10/site-packages/petals/models/mixtral/block.py", line 74, in forward
outputs = super().forward(
File "/home/qessia/.local/lib/python3.10/site-packages/transformers/models/mixtral/modeling_mixtral.py", line 934, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/home/qessia/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/qessia/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/qessia/.local/lib/python3.10/site-packages/transformers/models/mixtral/modeling_mixtral.py", line 356, in forward
key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
File "/home/qessia/.local/lib/python3.10/site-packages/transformers/cache_utils.py", line 131, in update
self.key_cache[layer_idx] = torch.cat([self.key_cache[layer_idx], key_states], dim=-2)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument tensors in method wrapper_CUDA_cat)
Hello! This is a strange error. Can you also provide a transformers' version?
Can you also provide a transformers' version?
4.38.2
Thank you for the information. It seems the only change required is this: https://github.com/bigscience-workshop/petals/pull/574. We will soon merge it with the main.
Hi! How is the work on the fixes going, is everything good? We are really looking for the merge
I had that same error on master as well and had a ticket open for it, https://github.com/bigscience-workshop/petals/issues/575
Sorry for taking so long; the fix is merged into the master.
I was able to get the branch mentioned running and my docker work rebased.
Have now tinymixtral running locally in gpu. https://github.com/meta-introspector/petals
Thank you for fixes!! It works
Reproduce:
python3 -m petals.cli.run_server mistralai/Mixtral-8x7B-v0.1 --new_swarm
orpython3 -m petals.cli.run_server SanjiWatsuki/TinyMixtral-32x248M --new_swarm
Got:
TypeError: WrappedMixtralBlock.__init__() missing 1 required positional argument: layer_idx
System: