davidberenstein1957 / concise-concepts

This repository contains an easy and intuitive approach to few-shot NER using most similar expansion over spaCy embeddings. Now with entity scoring.
MIT License
240 stars 15 forks source link

Loading a local NER model but has no embeddings #3

Closed marcossilva closed 2 years ago

marcossilva commented 2 years ago

Hi! I have a local trained model which only has NER in its pipeline and as soon as I try to add the concise-concepts data it returns

Exception: Choose a model with internal embeddings i.e. md or lg.

How can I train my model to have the necessary embeddings to work out with concise-concepts?

davidberenstein1957 commented 2 years ago

I could be that you don't have embeddings within the spaCy model. Try out concise-concepts==0.3.0 for using custom Gensim embeddings.

davidberenstein1957 commented 2 years ago

https://github.com/Pandora-Intelligence/concise-concepts#use-gensimword2vec-model-from-pre-trained-gensim-or-custom-model-path

marcossilva commented 2 years ago

Yeah, my model don't have embeddings but I would like to train it to have it. Using a gensim wouldn't really work out for my work case because many of my tokens would be OOV since most of my data comer from e-commerce data and not natural text. Is there any step in the spaCy pipelene required to make embeddings available on my model?

davidberenstein1957 commented 2 years ago

@marcossilva I recommend looking at the spaCy guide on embeddings or training . Alternatively, there is Gensim, you could also try to use a FastText model from Gensim to deal with OOV words for your specific dataset, or even better, just train a custom word2vec model for your data.