Closed sjaak31367 closed 1 year ago
It has to do with unknown models. If a model is not in here https://github.com/db0/AI-Horde-text-model-reference then untrusted workers only get 1 kudos, while trusted workers treat it as a 2.7B model
Then a follow-up question: How does one become a trusted worker?
And/or, is there a chance "{model} {GPTQ / GGML / 4bit / 8bit}" might become an acceptable name alteration (in the same manner "{model}::{name}" is allowed under https://github.com/Haidra-Org/AI-Horde/blob/main/horde/model_reference.py#L60 )?
How does one become a trusted worker?
7 days account + a good amount of generated kudos. Amount explicitly left secret.
And/or, is there a chance "{model} {GPTQ / GGML / 4bit / 8bit}" might become an acceptable name alteration (in the same manner "{model}::{name}" is allowed under
No, the model name you see with ::
is still shown with as a different model than the others. It just prevents the model being run by another user. But any name alteration is OK. the problem is just adding them to the known models list.
The problem is that there's no clear naming convention for models to signify the bits etc. If everyone had the same model convention, we could do something smart like that.
7 days and X generated kudos, that makes sense. Off I go waiting then haha.
There indeed does not seem to be a standardized naming convention which is a problem. A band-aid way of doing it might be something like
if model_name.count("GPTQ") + model_name.count("GGML") + model_name.count("4bit") != 0:
return int(self.text_reference[model_name]["parameters"]) / 1000000000 * quant_penalty
where quant_penalty
would be a value 0 ~ 1.
It's not the cleanest way of doing it, but it would make a lot of workers more measurably productive.
Also, it would incentivize people to run large quantized models (which usually outperform small fullres ones), rather than small full models (which currently earn more kudos (at least until you are trusted)).
Could be 0.25 for 16 float -> 4 bit Or could be as high as ~0.95 when looking at linguistic performance stats Or somewhere in-between for a compromise between factors (RAM, ops, scores)
def is_known_text_model(self, model_name):
# If it's a named model, we check if we can find it without the username
usermodel = model_name.split("::")
if len(usermodel) == 2:
model_name = usermodel[0]
# If it's a quantized model, check if we can find the non-quantized model in database
if model_name.count("GPTQ") + model_name.count("GGML") + mode_name.count("4bit") != 0:
is_known = False
is_known = is_known or model_name.split("GPTQ")[0].rstrip("-_./") in self.get_text_model_names()
is_known = is_known or model_name.split("GGML")[0].rstrip("-_./") in self.get_text_model_names()
is_known = is_known or model_name.split("4bit")[0].rstrip("-_./") in self.get_text_model_names()
if is_known:
return is_known
return model_name in self.get_text_model_names()
Should work with models with names such as
koboldcpp/Manticore-13B-Chat-Pyg.ggmlv3.q4_1
manticore-13b-chat-pyg_GPTQ
gozfarb/pygmalion-7b-4bit-128g-cuda
13B-HyperMantis/GPTQ_4bit-128g
When making 2 assumptions:
1: Model names are not case-sensitive
2: Model names are important, not the repository (e.g. FooBar/Pygmalion-7b
and Pygmalion-7b
are both valid descendants of PygmalionAI/pygmalion-7b
)
I've used lite.koboldai.net for the past 1~2 weeks, as well as running a worker off-and-on via KoboldAI (version: 0cc4m/koboldai : latestgptq) (Hardware: GTX 1070 (8GB))
And have noticed some strange results in kudo costs/rewards. 1: Some models/workers cost a consistent 4 kudos, no matter the request size. 2: Some workers only cost a consistent 4 kudos, while other workers with the same model cost the normal amount. 3: Some generations (with the same model, request, and worker) sometimes only cost 4 kudos, while other times costing the normal amount.
Data:
1: Example of models/workers with a consistent 4 kudo cost: | Model | Kudos | worker | size | stats | | ----- | ----- | ------ | ---- | ----- | | Wizard-Vicuna-30B-Uncensored-GPTQ | 4 | ??? | 512 / 2048 | ??? | | manticore-13b-chat-pyg-GPTQ | 4 | MarsupialDisco | 200 / 2048 | ETA 10s, queue 124, speed 11.3, qty 1 | 2: Example of workers with consistent 4 kudos | Model | Kudos | worker | size | stats | | ----- | ----- | ------ | ---- | ----- | | Pygmalion-7b-4bit-32g-GPTQ-Safetensors | 4 | Sjaak's GTX1070 (13.8T) | 80 / 1024 | ETA 6s, queue 170, speed 13, qty 2 | | Pygmalion-7b-4bit-32g-GPTQ-Safetensors | 13 | Bird Up! 0 (14.3T) | 80 / 1024 | ETA 6s, queue 170, speed 13, qty 2 | 3: Example of inconsistent costs: | Model | Kudos | worker | time | size | stats | | ----- | ----- | ------ | ---- | ---- | ----- | | facebook/opt-2.7b::RoFo#24043 | 2 | RoFo Scribe#1 | 14.0s | 80 / 1024 | ETA 0s, queue 0, speed 8, qty 1 | | facebook/opt-2.7b::RoFo#24043 | 13 | RoFo Scribe#1 | 10.0s | 80 / 1024 | ETA 0s, queue 0, speed 8, qty 1 | | facebook/opt-2.7b::RoFo#24043 | 13 | RoFo Scribe#1 | 11.8s | 80 / 1024 | ETA 0s, queue 0, speed 8, qty 1 | | facebook/opt-2.7b::RoFo#24043 | 2 | RoFo Scribe#1 | 9.9s | 256 / 2048 | ETA 0s, queue 0, speed 8, qty 1 | | facebook/opt-2.7b::RoFo#24043 | 13 | RoFo Scribe#1 | 11.2s | 256 / 2048 | ETA 0s, queue 0, speed 8, qty 1 | VicUnlocked-alpaca-65b-4bit-128g, though I have no formatted data, can cost 4 kudos or 30~130, seemingly without pattern.Further data:
Own worker: Model loaded, and the kudos received per request when running on the horde. | Model | size | Kudos / request (16/512 ~ 512/2048) | | -------------------------------------- | --------- | ------------------------------------------------ | | KoboldAI_OPT-2.7B-Erebus | 2.7b full | 2.06 ~ 65.83 | | mayaeary_pygmalion-6b-4bit-128g | 6.0b 4bit | 1.0 | | Pygmalion-7b-4bit-32g-GPTQ-Safetensors | 7.0b 4bit | 1.0 | | KoboldAI_GPT-Neo-2.7B-Horni | 2.7b full | 2.06 ~ 65.83 | | TheBloke_WizardLM-7B-uncensored-GPTQ | 7.0b 4bit | 1.0 | | PygmalionAI_pygmalion-1.3b | 1.3b full | 0.99 ~ 31.70 | Horde: Kudos spent per generation for models accessed via the horde. instruct 256 / 512 | Model | Kudos / request | other | | -------------------------------------------- | --------------- | ------------------------------ | | 13B-HyperMantis/GPTQ_4bit-128g | 34 | 11.6s queue(22s, 1908, 14, 6) | | asmodeus | 34 | 14.1s queue(0s, 0, 71.8, 5) | | concedo/koboldcpp | 4 | 9.8s queue(0s, 0, 4.2, 1) | | koboldcpp/Manticore-13B-Chat-Pyg.ggmlv3.q4_1 | 17 | 83.0s queue(526s, 474, 0.9, 1) | chat: 80 / 1024 | Model | Kudos / request | time elapsed | other | | -------------------------------------- | --------------- | ------------ | ------------------------- | | Pygmalion-7b-4bit-32g-GPTQ-Safetensors | | | queue(27s, 772, 14.2, 2) | | | 13 | 30.0s | | | | 4 | 29.1s | | | Pygmalion-7b-4bit-GPTQ-Safetensors | | | queue(70s, 747, 5.3, 2) | | | 13 | 28.4s | | | | 2 | 23.6s | | | manticore-13b-chat-pyg-GPTQ | | | queue(17s, 194, 11.1, 1) | | | 4 | 68.3s | | | manticore-13b-chat-pyg | | | queue(153s, 236+, 8.2, 1) | | | 13 | 141.6s | | | | 2 | 79.8s | |My guesses at possible causes are varied: My first thought was that it was that quantized models were not in the cost-database, but some quantized models do have a cost of 4+, so that doesn't seem to be the case. My second thought was improperly formatted model names (e.g. not following "{repo} / {name} {size} {variant}") but models like bittensor and asmodeus do still work for 4+ kudos Third: Time? Perhaps sometimes a generation is just too slow, and the horde considers it unworthy of full payment.
TL;DR: Sometimes a generation costs 4 kudos, sometimes more. Some workers get paid less for the same model.