bigscience-workshop / petals

🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading
https://petals.dev
MIT License
8.89k stars 489 forks source link

Error while hosting as provider #533

Open filopedraz opened 8 months ago

filopedraz commented 8 months ago

Description

Oct 31 07:53:33.251 [ERROR] [hivemind.moe.server.runtime.run:101] Caught 3736, attempting to recover
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/hivemind/moe/server/runtime.py", line 91, in run
    outputs, batch_size = self.process_batch(pool, batch_index, *batch)
  File "/opt/conda/lib/python3.10/site-packages/hivemind/moe/server/runtime.py", line 110, in process_batch
    outputs = pool.process_func(*batch)
  File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/petals/src/petals/server/backend.py", line 232, in __call__
    (hidden_states,) = self.backends[inference_info.uid].inference_step(hidden_states, hypo_ids, inference_info)
  File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/petals/src/petals/server/backend.py", line 119, in inference_step
    with self.memory_cache.use_cache(
  File "/opt/conda/lib/python3.10/contextlib.py", line 135, in __enter__
    return next(self.gen)
  File "/home/petals/src/petals/server/memory_cache.py", line 221, in use_cache
    yield tuple(self._allocated_tensors[handle] for handle in handles)
  File "/home/petals/src/petals/server/memory_cache.py", line 221, in <genexpr>
    yield tuple(self._allocated_tensors[handle] for handle in handles)
KeyError: 3736