explosion / spaCy

💫 Industrial-strength Natural Language Processing (NLP) in Python
https://spacy.io
MIT License
29.79k stars 4.37k forks source link

[E892] Unknown function registry: 'llm_backends' #12987

Closed rkatriel closed 11 months ago

rkatriel commented 1 year ago

How to reproduce the behaviour

I'm getting an "Unknown function registry: 'llm_backends'" error (see the traceback below) when running the example provided in Matthew Honnibal's blog "Against LLM maximalism" (https://explosion.ai/blog/against-llm-maximalism)

import spacy

nlp = spacy.blank("en")
nlp.add_pipe("sentencizer")
nlp.add_pipe(
    "llm",
    config={
        "task": {
            "@llm_tasks": "spacy.NER.v1",
            "labels": "SAAS_PLATFORM,PROGRAMMING_LANGUAGE,OPEN_SOURCE_LIBRARY"
        },
        "backend": {
            "@llm_backends": "spacy.REST.v1",
            "api": "OpenAI",
            "config": {"model": "text-davinci-003"},
        },
    },
)

doc = nlp("There's no PyTorch bindings for Go. We just use Microsoft Cognitive Services.")
for ent in doc.ents:
    print(ent.text, ent.label_, ent.sent)

Here is the full traceback:

File "/Users/ron.katriel/PycharmProjects/NLP/spacy-llm-example.py", line 5, in nlp.add_pipe( File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/spacy/language.py", line 786, in add_pipe pipe_component = self.create_pipe( File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/spacy/language.py", line 679, in createpipe resolved = registry.resolve(cfg, validate=validate) File "/Users/ron.katriel/PycharmProjects/Labs-Gen-AI/venv/lib/python3.10/site-packages/confection/init.py", line 756, in resolve resolved, = cls._make( File "/Users/ron.katriel/PycharmProjects/Labs-Gen-AI/venv/lib/python3.10/site-packages/confection/init.py", line 805, in make filled, , resolved = cls._fill( File "/Users/ron.katriel/PycharmProjects/Labs-Gen-AI/venv/lib/python3.10/site-packages/confection/init.py", line 860, in _fill filled[key], validation[v_key], final[key] = cls._fill( File "/Users/ron.katriel/PycharmProjects/Labs-Gen-AI/venv/lib/python3.10/site-packages/confection/init.py", line 859, in _fill promise_schema = cls.make_promise_schema(value, resolve=resolve) File "/Users/ron.katriel/PycharmProjects/Labs-Gen-AI/venv/lib/python3.10/site-packages/confection/init.py", line 1051, in make_promise_schema func = cls.get(reg_name, func_name) File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/spacy/util.py", line 128, in get raise RegistryError(Errors.E892.format(name=registry_name, available=names)) catalogue.RegistryError: [E892] Unknown function registry: 'llm_backends'.

Available names: architectures, augmenters, batchers, callbacks, cli, datasets, displacy_colors, factories, initializers, languages, layers, lemmatizers, llm_misc, llm_models, llm_queries, llm_tasks, loggers, lookups, losses, misc, models, ops, optimizers, readers, schedules, scorers, tokenizers

Your Environment

rmitsch commented 11 months ago

Sorry for not getting back to you earlier, this one fell through the cracks! The example in the blog is outdated, the API looks a bit different now. We'll update the blog soon. The correct way to initialize this with spacy-llm >= 0.4.0 looks like this:

nlp.add_pipe(
    "llm",
    config={
        "task": {
            "@llm_tasks": "spacy.NER.v1",
            "labels": "SAAS_PLATFORM,PROGRAMMING_LANGUAGE,OPEN_SOURCE_LIBRARY"
        },
        "model": {"@llm_models": "spacy.Davinci.v2"},
    },
)
rkatriel commented 11 months ago

@rmitsch Hi Raphael,

I tried your suggestion - after upgrading spacy and spacy-llm to the latest versions (3.7.2 and 0.6.2, respectively) - but now I'm getting a Config validation error. See the console trace below.

Thanks, Ron

nlp.add_pipe(

File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/spacy/language.py", line 821, in add_pipe pipe_component = self.create_pipe( File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/spacy/language.py", line 709, in createpipe resolved = registry.resolve(cfg, validate=validate) File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/confection/init.py", line 756, in resolve resolved, = cls._make( File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/confection/init.py", line 805, in make filled, , resolved = cls._fill( File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/confection/init.py", line 860, in _fill filled[key], validation[v_key], final[key] = cls._fill( File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/confection/init.py", line 860, in _fill filled[key], validation[v_key], final[key] = cls._fill( File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/confection/init.py", line 926, in _fill raise ConfigValidationError( confection.ConfigValidationError:

Config validation error llm.model -> llm_models extra fields not permitted {'llm_models': 'spacy.Davinci.v2', '@llm_models': 'spacy.GPT-3-5.v2', 'strict': True}

rmitsch commented 11 months ago

Can you share the config you're using?

rkatriel commented 11 months ago

I have no config file. Below is the code I'm running. The parameters are passed in the code as recommended.

import spacy

nlp = spacy.blank("en")
nlp.add_pipe("sentencizer")
nlp.add_pipe(
    "llm",
    config={
        "task": {
            "@llm_tasks": "spacy.NER.v1",
            "labels": "SAAS_PLATFORM,PROGRAMMING_LANGUAGE,OPEN_SOURCE_LIBRARY"
        },
        "model": {"llm_models": "spacy.Davinci.v2"},
    },
)

doc = nlp("There's no PyTorch bindings for Go. We just use Microsoft Cognitive Services.")
for ent in doc.ents:
    print(ent.text, ent.label_, ent.sent)
rmitsch commented 11 months ago

Ah, I forgot to an "@" in the example I've given above. Try again with this:

import spacy

nlp = spacy.blank("en")
nlp.add_pipe("sentencizer")
nlp.add_pipe(
    "llm",
    config={
        "task": {
            "@llm_tasks": "spacy.NER.v1",
            "labels": "SAAS_PLATFORM,PROGRAMMING_LANGUAGE,OPEN_SOURCE_LIBRARY"
        },
        "model": {"@llm_models": "spacy.Davinci.v2"},
    },
)

doc = nlp("There's no PyTorch bindings for Go. We just use Microsoft Cognitive Services.")
for ent in doc.ents:
    print(ent.text, ent.label_, ent.sent)
rkatriel commented 11 months ago

Thanks, that did the trick! But now I'm getting a connection error

ConnectionError: API could not be reached after 34.596 seconds in total and attempting to connect 5 times. Check your network connection and the API's availability. 429 Too Many Requests

This is likely from OpenAI because my account is not a paid one.

Is there an open source (e.g., Huggingface) model that works with this setup? I tried running the script with 'spacy.OpenLLaMA.v1' and got the following error

Config validation error llm.model -> name field required {'@llm_models': 'spacy.OpenLLaMA.v1'}

rmitsch commented 11 months ago

The ConnectionError usually is from the OpenAI rate-limiting you, yes. You could also increase the time between tries, but that's also unsatisfying.

OS models work the same way. Hugging Face models also appear in variations, and we don't select one by default (maybe we should). Anyway, have a look at the documentation to see which ones are available. You could go with the 3B one e. g. and do

import spacy

nlp = spacy.blank("en")
nlp.add_pipe("sentencizer")
nlp.add_pipe(
    "llm",
    config={
        "task": {
            "@llm_tasks": "spacy.NER.v1",
            "labels": "SAAS_PLATFORM,PROGRAMMING_LANGUAGE,OPEN_SOURCE_LIBRARY"
        },
        "model": {
            "@llm_models": "spacy.OpenLLaMa.v2",
            "name": "open_llama_3b"
        },
    },
)

doc = nlp("There's no PyTorch bindings for Go. We just use Microsoft Cognitive Services.")
for ent in doc.ents:
    print(ent.text, ent.label_, ent.sent)

Note: OpenLLaMa is an older model, and the 3B model is small. You'll probably won't get amazing results out of using it.

rkatriel commented 11 months ago

Thanks Raphael, but that doesn't work. I'm get the following catalogue/registry error:

catalogue.RegistryError: [E893] Could not find function 'spacy.OpenLLaMa.v2' in function registry 'llm_models'. If you're using a custom function, make sure the code is available. If the function is provided by a third-party package, e.g. spacy-transformers, make sure the package is installed in your environment.

Changing to 'spacy.OpenLLaMa.v1', as implied in the rest of the error message below, does not help.

Available names: langchain.AI21.v1, langchain.AlephAlpha.v1, langchain.Anthropic.v1, langchain.Anyscale.v1, langchain.Aviary.v1, langchain.AzureOpenAI.v1, langchain.Banana.v1, langchain.Beam.v1, langchain.CTransformers.v1, langchain.CerebriumAI.v1, langchain.Cohere.v1, langchain.Databricks.v1, langchain.DeepInfra.v1, langchain.FakeListLLM.v1, langchain.ForefrontAI.v1, langchain.GPT4All.v1, langchain.GooglePalm.v1, langchain.GooseAI.v1, langchain.HuggingFaceEndpoint.v1, langchain.HuggingFaceHub.v1, langchain.HuggingFacePipeline.v1, langchain.HuggingFaceTextGenInference.v1, langchain.HumanInputLLM.v1, langchain.LlamaCpp.v1, langchain.Modal.v1, langchain.MosaicML.v1, langchain.NLPCloud.v1, langchain.OpenAI.v1, langchain.OpenLM.v1, langchain.Petals.v1, langchain.PipelineAI.v1, langchain.RWKV.v1, langchain.Replicate.v1, langchain.SagemakerEndpoint.v1, langchain.SelfHostedHuggingFaceLLM.v1, langchain.SelfHostedPipeline.v1, langchain.StochasticAI.v1, langchain.VertexAI.v1, langchain.Writer.v1, spacy.Ada.v1, spacy.Ada.v2, spacy.Azure.v1, spacy.Babbage.v1, spacy.Babbage.v2, spacy.Claude-1-0.v1, spacy.Claude-1-2.v1, spacy.Claude-1-3.v1, spacy.Claude-1.v1, spacy.Claude-2.v1, spacy.Claude-instant-1-1.v1, spacy.Claude-instant-1.v1, spacy.Code-Davinci.v1, spacy.Code-Davinci.v2, spacy.Command.v1, spacy.Curie.v1, spacy.Curie.v2, spacy.Davinci.v1, spacy.Davinci.v2, spacy.Dolly.v1, spacy.Falcon.v1, spacy.GPT-3-5.v1, spacy.GPT-3-5.v2, spacy.GPT-4.v1, spacy.GPT-4.v2, spacy.Llama2.v1, spacy.Mistral.v1, spacy.NoOp.v1, spacy.OpenLLaMA.v1, spacy.PaLM.v1, spacy.StableLM.v1, spacy.Text-Ada.v1, spacy.Text-Ada.v2, spacy.Text-Babbage.v1, spacy.Text-Babbage.v2, spacy.Text-Curie.v1, spacy.Text-Curie.v2, spacy.Text-Davinci.v1, spacy.Text-Davinci.v2

rmitsch commented 11 months ago

A typo on my end, use spacy.OpenLLaMa.v1 instead of spacy.OpenLLaMa.v2.

rkatriel commented 11 months ago

Already tried that, as mentioned above. Same type of error

catalogue.RegistryError: [E893] Could not find function 'spacy.OpenLLaMa.v1' in function registry 'llm_models'. If you're using a custom function, make sure the code is available. If the function is provided by a third-party package, e.g. spacy-transformers, make sure the package is installed in your environment.

rmitsch commented 11 months ago

Argh, these different Llama casings always get me. So the correct spelling is spacy.OpenLLaMA.v1, not spacy.OpenLLaMa.v1 (notice that the last "a" is uppercase). Apologies for not double-checking.

rkatriel commented 11 months ago

Thanks, Raphael! That did the trick, though after fixing it I got a new error:

Tokenizer class LlamaTokenizer does not exist or is not currently imported.

It turns out this is a known issue and is solved by uninstalling/reinstalling the transformers library.

So now we're past the loading of the model but not out of the woods. I'm getting the following error when calling Spacy's nlp() function with the query shown in the code above (see the full traceback below):

RuntimeError: Placeholder storage has not been allocated on MPS device!

(I thought this could be an issue with Intel vs. Apple silicon but I'm getting the same error on a MacBook with the M2 chip)

Any thoughts on how to resolve this?

Ron

Traceback (most recent call last):
  File "/Users/ron.katriel/PycharmProjects/Transformer/test-spacy-llm.py", line 19, in <module>
    doc = nlp("There's no PyTorch bindings for Go. We just use Microsoft Cognitive Services.")
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/spacy/language.py", line 1054, in __call__
    error_handler(name, proc, [doc], e)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/spacy/util.py", line 1704, in raise_error
    raise e
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/spacy/language.py", line 1049, in __call__
    doc = proc(doc, **component_cfg.get(name, {}))  # type: ignore[call-arg]
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/spacy_llm/pipeline/llm.py", line 156, in __call__
    docs = self._process_docs([doc])
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/spacy_llm/pipeline/llm.py", line 210, in _process_docs
    responses_iters = tee(self._model(prompts_iters[0]), n_iters)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/spacy_llm/models/hf/openllama.py", line 55, in __call__
    return [
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/spacy_llm/models/hf/openllama.py", line 57, in <listcomp>
    self._model.generate(input_ids=tii, **self._config_run)[
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/transformers/generation/utils.py", line 1606, in generate
    return self.greedy_search(
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/transformers/generation/utils.py", line 2454, in greedy_search
    outputs = self(
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1038, in forward
    outputs = self.model(
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 875, in forward
    inputs_embeds = self.embed_tokens(input_ids)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 162, in forward
    return F.embedding(
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/torch/nn/functional.py", line 2210, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Placeholder storage has not been allocated on MPS device!
rmitsch commented 11 months ago

Huh, that's odd. You're getting this error when running exactly this snippet?

rkatriel commented 11 months ago

Correct - except spacy.OpenLLaMA.v1 instead of spacy.OpenLLaMa.v1, as you suggested above.

rmitsch commented 11 months ago

Which machine are you running this one? We'd like to try replicating this.

rmitsch commented 11 months ago

Also, I'd appreciate if you opened a new issue for this problem. Might be useful for other users :pray:

rkatriel commented 11 months ago

Done! The new issue is "Spacy-LLM fails with storage not allocated on MPS device" https://github.com/explosion/spaCy/issues/13096

github-actions[bot] commented 10 months ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.