bigscience-workshop / petals

🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading
https://petals.dev
MIT License
8.89k stars 489 forks source link

Store (start_block, end_block) in each DHT record for reliability #510

Closed borzunov closed 9 months ago

borzunov commented 9 months ago

This PR fixes gaps in the DHT server info caused by unavailable DHT keys (see below). Now, one DHT key is enough to get info about all blocks hosted by a server - so we'll see info until all keys are unavailable.

Also, this PR refactors petals.client.routing and petals.server.block_selection modules to use the common compute_spans() function (defined in petals.utils.dht) and RemoteSpanInfo class (defined in petals.data_structures).

Screenshot 2023-09-15 at 18 02 04
poedator commented 9 months ago

1) do we need to update health.petals.dev code then?

2) if some individual blocks info was lost in the past from DHT, do we face now the same risk for losing whole set of blocks info due to the same reason?

edugamerplay1228 commented 9 months ago

Pls help

Traceback (most recent call last): File "/home/edu/.local/lib/python3.10/site-packages/gunicorn/arbiter.py", line 609, in spawn_worker worker.init_process() File "/home/edu/.local/lib/python3.10/site-packages/gunicorn/workers/gthread.py", line 95, in init_process super().init_process() File "/home/edu/.local/lib/python3.10/site-packages/gunicorn/workers/base.py", line 134, in init_process self.load_wsgi() File "/home/edu/.local/lib/python3.10/site-packages/gunicorn/workers/base.py", line 146, in load_wsgi self.wsgi = self.app.wsgi() File "/home/edu/.local/lib/python3.10/site-packages/gunicorn/app/base.py", line 67, in wsgi self.callable = self.load() File "/home/edu/.local/lib/python3.10/site-packages/gunicorn/app/wsgiapp.py", line 58, in load return self.load_wsgiapp() File "/home/edu/.local/lib/python3.10/site-packages/gunicorn/app/wsgiapp.py", line 48, in load_wsgiapp return util.import_app(self.app_uri) File "/home/edu/.local/lib/python3.10/site-packages/gunicorn/util.py", line 371, in import_app mod = importlib.import_module(module) File "/usr/lib/python3.10/importlib/init.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1050, in _gcd_import File "", line 1027, in _find_and_load File "", line 1006, in _find_and_load_unlocked File "", line 688, in _load_unlocked File "", line 883, in exec_module File "", line 241, in _call_with_frames_removed File "/mnt/d/SystemFolders/Desktop/health.petals.dev/app.py", line 9, in from state_updater import StateUpdaterThread File "/mnt/d/SystemFolders/Desktop/health.petals.dev/state_updater.py", line 12, in from health import fetch_health_state File "/mnt/d/SystemFolders/Desktop/health.petals.dev/health.py", line 12, in from petals.utils.dht import compute_spans, get_remote_module_infos ImportError: cannot import name 'compute_spans' from 'petals.utils.dht' (/home/edu/.local/lib/python3.10/site-packages/petals/utils/dht.py)

borzunov commented 9 months ago

Hi @edugamerplay1228,

Please upgrade Petals to the latest version to get rid of this error:

pip install --upgrade git+https://github.com/bigscience-workshop/petals

Alternatively, you can downgrade the health service so that it doesn't rely on the petals.utils.dht.compute_spans() function added in this PR:

cd health.petals.dev
git checkout 0bafc79f10ed809f13972fef0d3a08d436321805
flask run --host=0.0.0.0 --port=5000