Closed rkatriel closed 11 months ago
Sorry for not getting back to you earlier, this one fell through the cracks! The example in the blog is outdated, the API looks a bit different now. We'll update the blog soon. The correct way to initialize this with spacy-llm
>= 0.4.0 looks like this:
nlp.add_pipe(
"llm",
config={
"task": {
"@llm_tasks": "spacy.NER.v1",
"labels": "SAAS_PLATFORM,PROGRAMMING_LANGUAGE,OPEN_SOURCE_LIBRARY"
},
"model": {"@llm_models": "spacy.Davinci.v2"},
},
)
@rmitsch Hi Raphael,
I tried your suggestion - after upgrading spacy and spacy-llm to the latest versions (3.7.2 and 0.6.2, respectively) - but now I'm getting a Config validation error. See the console trace below.
Thanks, Ron
nlp.add_pipe(
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/spacy/language.py", line 821, in add_pipe pipe_component = self.create_pipe( File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/spacy/language.py", line 709, in createpipe resolved = registry.resolve(cfg, validate=validate) File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/confection/init.py", line 756, in resolve resolved, = cls._make( File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/confection/init.py", line 805, in make filled, , resolved = cls._fill( File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/confection/init.py", line 860, in _fill filled[key], validation[v_key], final[key] = cls._fill( File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/confection/init.py", line 860, in _fill filled[key], validation[v_key], final[key] = cls._fill( File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/confection/init.py", line 926, in _fill raise ConfigValidationError( confection.ConfigValidationError:
Config validation error llm.model -> llm_models extra fields not permitted {'llm_models': 'spacy.Davinci.v2', '@llm_models': 'spacy.GPT-3-5.v2', 'strict': True}
Can you share the config you're using?
I have no config file. Below is the code I'm running. The parameters are passed in the code as recommended.
import spacy
nlp = spacy.blank("en")
nlp.add_pipe("sentencizer")
nlp.add_pipe(
"llm",
config={
"task": {
"@llm_tasks": "spacy.NER.v1",
"labels": "SAAS_PLATFORM,PROGRAMMING_LANGUAGE,OPEN_SOURCE_LIBRARY"
},
"model": {"llm_models": "spacy.Davinci.v2"},
},
)
doc = nlp("There's no PyTorch bindings for Go. We just use Microsoft Cognitive Services.")
for ent in doc.ents:
print(ent.text, ent.label_, ent.sent)
Ah, I forgot to an "@" in the example I've given above. Try again with this:
import spacy
nlp = spacy.blank("en")
nlp.add_pipe("sentencizer")
nlp.add_pipe(
"llm",
config={
"task": {
"@llm_tasks": "spacy.NER.v1",
"labels": "SAAS_PLATFORM,PROGRAMMING_LANGUAGE,OPEN_SOURCE_LIBRARY"
},
"model": {"@llm_models": "spacy.Davinci.v2"},
},
)
doc = nlp("There's no PyTorch bindings for Go. We just use Microsoft Cognitive Services.")
for ent in doc.ents:
print(ent.text, ent.label_, ent.sent)
Thanks, that did the trick! But now I'm getting a connection error
ConnectionError: API could not be reached after 34.596 seconds in total and attempting to connect 5 times. Check your network connection and the API's availability. 429 Too Many Requests
This is likely from OpenAI because my account is not a paid one.
Is there an open source (e.g., Huggingface) model that works with this setup? I tried running the script with 'spacy.OpenLLaMA.v1' and got the following error
Config validation error llm.model -> name field required {'@llm_models': 'spacy.OpenLLaMA.v1'}
The ConnectionError
usually is from the OpenAI rate-limiting you, yes. You could also increase the time between tries, but that's also unsatisfying.
OS models work the same way. Hugging Face models also appear in variations, and we don't select one by default (maybe we should). Anyway, have a look at the documentation to see which ones are available. You could go with the 3B one e. g. and do
import spacy
nlp = spacy.blank("en")
nlp.add_pipe("sentencizer")
nlp.add_pipe(
"llm",
config={
"task": {
"@llm_tasks": "spacy.NER.v1",
"labels": "SAAS_PLATFORM,PROGRAMMING_LANGUAGE,OPEN_SOURCE_LIBRARY"
},
"model": {
"@llm_models": "spacy.OpenLLaMa.v2",
"name": "open_llama_3b"
},
},
)
doc = nlp("There's no PyTorch bindings for Go. We just use Microsoft Cognitive Services.")
for ent in doc.ents:
print(ent.text, ent.label_, ent.sent)
Note: OpenLLaMa is an older model, and the 3B model is small. You'll probably won't get amazing results out of using it.
Thanks Raphael, but that doesn't work. I'm get the following catalogue/registry error:
catalogue.RegistryError: [E893] Could not find function 'spacy.OpenLLaMa.v2' in function registry 'llm_models'. If you're using a custom function, make sure the code is available. If the function is provided by a third-party package, e.g. spacy-transformers, make sure the package is installed in your environment.
Changing to 'spacy.OpenLLaMa.v1', as implied in the rest of the error message below, does not help.
Available names: langchain.AI21.v1, langchain.AlephAlpha.v1, langchain.Anthropic.v1, langchain.Anyscale.v1, langchain.Aviary.v1, langchain.AzureOpenAI.v1, langchain.Banana.v1, langchain.Beam.v1, langchain.CTransformers.v1, langchain.CerebriumAI.v1, langchain.Cohere.v1, langchain.Databricks.v1, langchain.DeepInfra.v1, langchain.FakeListLLM.v1, langchain.ForefrontAI.v1, langchain.GPT4All.v1, langchain.GooglePalm.v1, langchain.GooseAI.v1, langchain.HuggingFaceEndpoint.v1, langchain.HuggingFaceHub.v1, langchain.HuggingFacePipeline.v1, langchain.HuggingFaceTextGenInference.v1, langchain.HumanInputLLM.v1, langchain.LlamaCpp.v1, langchain.Modal.v1, langchain.MosaicML.v1, langchain.NLPCloud.v1, langchain.OpenAI.v1, langchain.OpenLM.v1, langchain.Petals.v1, langchain.PipelineAI.v1, langchain.RWKV.v1, langchain.Replicate.v1, langchain.SagemakerEndpoint.v1, langchain.SelfHostedHuggingFaceLLM.v1, langchain.SelfHostedPipeline.v1, langchain.StochasticAI.v1, langchain.VertexAI.v1, langchain.Writer.v1, spacy.Ada.v1, spacy.Ada.v2, spacy.Azure.v1, spacy.Babbage.v1, spacy.Babbage.v2, spacy.Claude-1-0.v1, spacy.Claude-1-2.v1, spacy.Claude-1-3.v1, spacy.Claude-1.v1, spacy.Claude-2.v1, spacy.Claude-instant-1-1.v1, spacy.Claude-instant-1.v1, spacy.Code-Davinci.v1, spacy.Code-Davinci.v2, spacy.Command.v1, spacy.Curie.v1, spacy.Curie.v2, spacy.Davinci.v1, spacy.Davinci.v2, spacy.Dolly.v1, spacy.Falcon.v1, spacy.GPT-3-5.v1, spacy.GPT-3-5.v2, spacy.GPT-4.v1, spacy.GPT-4.v2, spacy.Llama2.v1, spacy.Mistral.v1, spacy.NoOp.v1, spacy.OpenLLaMA.v1, spacy.PaLM.v1, spacy.StableLM.v1, spacy.Text-Ada.v1, spacy.Text-Ada.v2, spacy.Text-Babbage.v1, spacy.Text-Babbage.v2, spacy.Text-Curie.v1, spacy.Text-Curie.v2, spacy.Text-Davinci.v1, spacy.Text-Davinci.v2
A typo on my end, use spacy.OpenLLaMa.v1
instead of spacy.OpenLLaMa.v2
.
Already tried that, as mentioned above. Same type of error
catalogue.RegistryError: [E893] Could not find function 'spacy.OpenLLaMa.v1' in function registry 'llm_models'. If you're using a custom function, make sure the code is available. If the function is provided by a third-party package, e.g. spacy-transformers, make sure the package is installed in your environment.
Argh, these different Llama casings always get me. So the correct spelling is spacy.OpenLLaMA.v1
, not spacy.OpenLLaMa.v1
(notice that the last "a" is uppercase). Apologies for not double-checking.
Thanks, Raphael! That did the trick, though after fixing it I got a new error:
Tokenizer class LlamaTokenizer does not exist or is not currently imported.
It turns out this is a known issue and is solved by uninstalling/reinstalling the transformers library.
So now we're past the loading of the model but not out of the woods. I'm getting the following error when calling Spacy's nlp() function with the query shown in the code above (see the full traceback below):
RuntimeError: Placeholder storage has not been allocated on MPS device!
(I thought this could be an issue with Intel vs. Apple silicon but I'm getting the same error on a MacBook with the M2 chip)
Any thoughts on how to resolve this?
Ron
Traceback (most recent call last):
File "/Users/ron.katriel/PycharmProjects/Transformer/test-spacy-llm.py", line 19, in <module>
doc = nlp("There's no PyTorch bindings for Go. We just use Microsoft Cognitive Services.")
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/spacy/language.py", line 1054, in __call__
error_handler(name, proc, [doc], e)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/spacy/util.py", line 1704, in raise_error
raise e
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/spacy/language.py", line 1049, in __call__
doc = proc(doc, **component_cfg.get(name, {})) # type: ignore[call-arg]
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/spacy_llm/pipeline/llm.py", line 156, in __call__
docs = self._process_docs([doc])
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/spacy_llm/pipeline/llm.py", line 210, in _process_docs
responses_iters = tee(self._model(prompts_iters[0]), n_iters)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/spacy_llm/models/hf/openllama.py", line 55, in __call__
return [
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/spacy_llm/models/hf/openllama.py", line 57, in <listcomp>
self._model.generate(input_ids=tii, **self._config_run)[
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/transformers/generation/utils.py", line 1606, in generate
return self.greedy_search(
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/transformers/generation/utils.py", line 2454, in greedy_search
outputs = self(
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1038, in forward
outputs = self.model(
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 875, in forward
inputs_embeds = self.embed_tokens(input_ids)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 162, in forward
return F.embedding(
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/torch/nn/functional.py", line 2210, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Placeholder storage has not been allocated on MPS device!
Huh, that's odd. You're getting this error when running exactly this snippet?
Correct - except spacy.OpenLLaMA.v1 instead of spacy.OpenLLaMa.v1, as you suggested above.
Which machine are you running this one? We'd like to try replicating this.
Also, I'd appreciate if you opened a new issue for this problem. Might be useful for other users :pray:
Done! The new issue is "Spacy-LLM fails with storage not allocated on MPS device" https://github.com/explosion/spaCy/issues/13096
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
How to reproduce the behaviour
I'm getting an "Unknown function registry: 'llm_backends'" error (see the traceback below) when running the example provided in Matthew Honnibal's blog "Against LLM maximalism" (https://explosion.ai/blog/against-llm-maximalism)
Here is the full traceback:
File "/Users/ron.katriel/PycharmProjects/NLP/spacy-llm-example.py", line 5, in
nlp.add_pipe(
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/spacy/language.py", line 786, in add_pipe
pipe_component = self.create_pipe(
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/spacy/language.py", line 679, in createpipe
resolved = registry.resolve(cfg, validate=validate)
File "/Users/ron.katriel/PycharmProjects/Labs-Gen-AI/venv/lib/python3.10/site-packages/confection/init.py", line 756, in resolve
resolved, = cls._make(
File "/Users/ron.katriel/PycharmProjects/Labs-Gen-AI/venv/lib/python3.10/site-packages/confection/init.py", line 805, in make
filled, , resolved = cls._fill(
File "/Users/ron.katriel/PycharmProjects/Labs-Gen-AI/venv/lib/python3.10/site-packages/confection/init.py", line 860, in _fill
filled[key], validation[v_key], final[key] = cls._fill(
File "/Users/ron.katriel/PycharmProjects/Labs-Gen-AI/venv/lib/python3.10/site-packages/confection/init.py", line 859, in _fill
promise_schema = cls.make_promise_schema(value, resolve=resolve)
File "/Users/ron.katriel/PycharmProjects/Labs-Gen-AI/venv/lib/python3.10/site-packages/confection/init.py", line 1051, in make_promise_schema
func = cls.get(reg_name, func_name)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/spacy/util.py", line 128, in get
raise RegistryError(Errors.E892.format(name=registry_name, available=names))
catalogue.RegistryError: [E892] Unknown function registry: 'llm_backends'.
Available names: architectures, augmenters, batchers, callbacks, cli, datasets, displacy_colors, factories, initializers, languages, layers, lemmatizers, llm_misc, llm_models, llm_queries, llm_tasks, loggers, lookups, losses, misc, models, ops, optimizers, readers, schedules, scorers, tokenizers
Your Environment