0cc4m / KoboldAI

GNU Affero General Public License v3.0
150 stars 31 forks source link

[Regression] Can't participate in horde with `exllama` branch, stopping sharing breaks processing #73

Open InconsolableCellist opened 1 year ago

InconsolableCellist commented 1 year ago

Summary

Probably due to the switch to AI-Horde-Worker instead of KoboldAI-Horde-Worker, I can no longer participate in Horde. The console outputs a stream of:

Environment

Linux Any model loaded with ExLlama (splitting it across two GPUs in my case)

Steps to Reproduce

  1. git clone https://github.com/0cc4m/KoboldAI.git
  2. cd KoboldAI
  3. git checkout exllama
  4. ./play.sh --host
  5. In the UI go to settings, name the Horde Worker and set the Horde API Key
  6. Go to Home, Load Model, select the model from a directory (airoboros-l2-70b-gpt4-2.0 in my case)
  7. Pick ExLlama from the loader dropdown, split 35/45, context 4096, click Load
  8. When loading is complete, check "Share with Horde" in the slider

Observed Results

The console outputs:

INFO       | 2023-09-01 06:22:23.578472 | worker.jobs.poppers:report_skipped_info:85 - Server https://horde.koboldai.net has no valid generations for us to do.
INFO       | 2023-09-01 06:22:25.999475 | worker.jobs.poppers:report_skipped_info:85 - Server https://horde.koboldai.net has no valid generations for us to do.
INFO       | 2023-09-01 06:22:28.183659 | worker.jobs.poppers:report_skipped_info:85 - Server https://horde.koboldai.net has no valid generations for us to do.
INFO       | 2023-09-01 06:22:29.786495 | worker.jobs.poppers:report_skipped_info:85 - Server https://horde.koboldai.net has no valid generations for us to do.

Going to lite.koboldai.net and manually selecting the worker and submitting a job doesn't make it process it. Previously I'd get jobs within one second of clicking "share" as well.

Additionally, unchecking "share with horde" in the UI now results in a red error popup that says:

Error at koboldai.js:3236
Uncaught TypeError: Cannot use 'in' operator to search for 'status' in undefined
--
Please report this error to the developers.

The console prints out:

  File "/home/user/Programs/KoboldAI/AI-Horde-Worker/worker/workers/framework.py", line 58, in stop
    self.ui_class.stop()
    │    └ None
    └ <AI-Horde-Worker.worker.workers.scribe.ScribeWorker object at 0x7f9c6e4d1100>

AttributeError: 'NoneType' object has no attribute 'stop'

At this point the model can't be shared again nor can a new model be loaded. The backend seems locked and has to be restarted.

Expected Results

The model is shared; it participates in the horde with jobs being processed and sent off; sharing can be stopped; and a new model can be loaded and started.