Haidra-Org / AI-Horde

A crowdsourced distributed cluster for AI art and text generation
GNU Affero General Public License v3.0
1.1k stars 134 forks source link

Bug: Inconsistend kudos calculation for text workers #242

Closed sjaak31367 closed 1 year ago

sjaak31367 commented 1 year ago

I've used lite.koboldai.net for the past 1~2 weeks, as well as running a worker off-and-on via KoboldAI (version: 0cc4m/koboldai : latestgptq) (Hardware: GTX 1070 (8GB))

And have noticed some strange results in kudo costs/rewards. 1: Some models/workers cost a consistent 4 kudos, no matter the request size. 2: Some workers only cost a consistent 4 kudos, while other workers with the same model cost the normal amount. 3: Some generations (with the same model, request, and worker) sometimes only cost 4 kudos, while other times costing the normal amount.

Data: 1: Example of models/workers with a consistent 4 kudo cost: | Model | Kudos | worker | size | stats | | ----- | ----- | ------ | ---- | ----- | | Wizard-Vicuna-30B-Uncensored-GPTQ | 4 | ??? | 512 / 2048 | ??? | | manticore-13b-chat-pyg-GPTQ | 4 | MarsupialDisco | 200 / 2048 | ETA 10s, queue 124, speed 11.3, qty 1 | 2: Example of workers with consistent 4 kudos | Model | Kudos | worker | size | stats | | ----- | ----- | ------ | ---- | ----- | | Pygmalion-7b-4bit-32g-GPTQ-Safetensors | 4 | Sjaak's GTX1070 (13.8T) | 80 / 1024 | ETA 6s, queue 170, speed 13, qty 2 | | Pygmalion-7b-4bit-32g-GPTQ-Safetensors | 13 | Bird Up! 0 (14.3T) | 80 / 1024 | ETA 6s, queue 170, speed 13, qty 2 | 3: Example of inconsistent costs: | Model | Kudos | worker | time | size | stats | | ----- | ----- | ------ | ---- | ---- | ----- | | facebook/opt-2.7b::RoFo#24043 | 2 | RoFo Scribe#1 | 14.0s | 80 / 1024 | ETA 0s, queue 0, speed 8, qty 1 | | facebook/opt-2.7b::RoFo#24043 | 13 | RoFo Scribe#1 | 10.0s | 80 / 1024 | ETA 0s, queue 0, speed 8, qty 1 | | facebook/opt-2.7b::RoFo#24043 | 13 | RoFo Scribe#1 | 11.8s | 80 / 1024 | ETA 0s, queue 0, speed 8, qty 1 | | facebook/opt-2.7b::RoFo#24043 | 2 | RoFo Scribe#1 | 9.9s | 256 / 2048 | ETA 0s, queue 0, speed 8, qty 1 | | facebook/opt-2.7b::RoFo#24043 | 13 | RoFo Scribe#1 | 11.2s | 256 / 2048 | ETA 0s, queue 0, speed 8, qty 1 | VicUnlocked-alpaca-65b-4bit-128g, though I have no formatted data, can cost 4 kudos or 30~130, seemingly without pattern.
Further data: Own worker: Model loaded, and the kudos received per request when running on the horde. | Model | size | Kudos / request (16/512 ~ 512/2048) | | -------------------------------------- | --------- | ------------------------------------------------ | | KoboldAI_OPT-2.7B-Erebus | 2.7b full | 2.06 ~ 65.83 | | mayaeary_pygmalion-6b-4bit-128g | 6.0b 4bit | 1.0 | | Pygmalion-7b-4bit-32g-GPTQ-Safetensors | 7.0b 4bit | 1.0 | | KoboldAI_GPT-Neo-2.7B-Horni | 2.7b full | 2.06 ~ 65.83 | | TheBloke_WizardLM-7B-uncensored-GPTQ | 7.0b 4bit | 1.0 | | PygmalionAI_pygmalion-1.3b | 1.3b full | 0.99 ~ 31.70 | Horde: Kudos spent per generation for models accessed via the horde. instruct 256 / 512 | Model | Kudos / request | other | | -------------------------------------------- | --------------- | ------------------------------ | | 13B-HyperMantis/GPTQ_4bit-128g | 34 | 11.6s queue(22s, 1908, 14, 6) | | asmodeus | 34 | 14.1s queue(0s, 0, 71.8, 5) | | concedo/koboldcpp | 4 | 9.8s queue(0s, 0, 4.2, 1) | | koboldcpp/Manticore-13B-Chat-Pyg.ggmlv3.q4_1 | 17 | 83.0s queue(526s, 474, 0.9, 1) | chat: 80 / 1024 | Model | Kudos / request | time elapsed | other | | -------------------------------------- | --------------- | ------------ | ------------------------- | | Pygmalion-7b-4bit-32g-GPTQ-Safetensors | | | queue(27s, 772, 14.2, 2) | | | 13 | 30.0s | | | | 4 | 29.1s | | | Pygmalion-7b-4bit-GPTQ-Safetensors | | | queue(70s, 747, 5.3, 2) | | | 13 | 28.4s | | | | 2 | 23.6s | | | manticore-13b-chat-pyg-GPTQ | | | queue(17s, 194, 11.1, 1) | | | 4 | 68.3s | | | manticore-13b-chat-pyg | | | queue(153s, 236+, 8.2, 1) | | | 13 | 141.6s | | | | 2 | 79.8s | |

My guesses at possible causes are varied: My first thought was that it was that quantized models were not in the cost-database, but some quantized models do have a cost of 4+, so that doesn't seem to be the case. My second thought was improperly formatted model names (e.g. not following "{repo} / {name} {size} {variant}") but models like bittensor and asmodeus do still work for 4+ kudos Third: Time? Perhaps sometimes a generation is just too slow, and the horde considers it unworthy of full payment.

TL;DR: Sometimes a generation costs 4 kudos, sometimes more. Some workers get paid less for the same model.

db0 commented 1 year ago

It has to do with unknown models. If a model is not in here https://github.com/db0/AI-Horde-text-model-reference then untrusted workers only get 1 kudos, while trusted workers treat it as a 2.7B model

sjaak31367 commented 1 year ago

Then a follow-up question: How does one become a trusted worker?

And/or, is there a chance "{model} {GPTQ / GGML / 4bit / 8bit}" might become an acceptable name alteration (in the same manner "{model}::{name}" is allowed under https://github.com/Haidra-Org/AI-Horde/blob/main/horde/model_reference.py#L60 )?

db0 commented 1 year ago

How does one become a trusted worker?

7 days account + a good amount of generated kudos. Amount explicitly left secret.

And/or, is there a chance "{model} {GPTQ / GGML / 4bit / 8bit}" might become an acceptable name alteration (in the same manner "{model}::{name}" is allowed under

No, the model name you see with :: is still shown with as a different model than the others. It just prevents the model being run by another user. But any name alteration is OK. the problem is just adding them to the known models list.

The problem is that there's no clear naming convention for models to signify the bits etc. If everyone had the same model convention, we could do something smart like that.

sjaak31367 commented 1 year ago

7 days and X generated kudos, that makes sense. Off I go waiting then haha.

There indeed does not seem to be a standardized naming convention which is a problem. A band-aid way of doing it might be something like

if model_name.count("GPTQ") + model_name.count("GGML") + model_name.count("4bit") != 0:
  return int(self.text_reference[model_name]["parameters"]) / 1000000000 * quant_penalty

where quant_penalty would be a value 0 ~ 1. It's not the cleanest way of doing it, but it would make a lot of workers more measurably productive. Also, it would incentivize people to run large quantized models (which usually outperform small fullres ones), rather than small full models (which currently earn more kudos (at least until you are trusted)).

Could be 0.25 for 16 float -> 4 bit Or could be as high as ~0.95 when looking at linguistic performance stats Or somewhere in-between for a compromise between factors (RAM, ops, scores)

sjaak31367 commented 1 year ago
def is_known_text_model(self, model_name):
    # If it's a named model, we check if we can find it without the username
    usermodel = model_name.split("::")
    if len(usermodel) == 2:
        model_name = usermodel[0]
    # If it's a quantized model, check if we can find the non-quantized model in database
    if model_name.count("GPTQ") + model_name.count("GGML") + mode_name.count("4bit") != 0:
        is_known = False
        is_known = is_known or model_name.split("GPTQ")[0].rstrip("-_./") in self.get_text_model_names()
        is_known = is_known or model_name.split("GGML")[0].rstrip("-_./") in self.get_text_model_names()
        is_known = is_known or model_name.split("4bit")[0].rstrip("-_./") in self.get_text_model_names()
        if is_known:
            return is_known
    return model_name in self.get_text_model_names()

Should work with models with names such as

koboldcpp/Manticore-13B-Chat-Pyg.ggmlv3.q4_1 manticore-13b-chat-pyg_GPTQ gozfarb/pygmalion-7b-4bit-128g-cuda 13B-HyperMantis/GPTQ_4bit-128g

When making 2 assumptions: 1: Model names are not case-sensitive 2: Model names are important, not the repository (e.g. FooBar/Pygmalion-7b and Pygmalion-7b are both valid descendants of PygmalionAI/pygmalion-7b)