NorskRegnesentral / skweak

skweak: A software toolkit for weak supervision applied to NLP tasks
MIT License
918 stars 73 forks source link

Support for loading any pre-trained model inside the 'Model Annotator' #66

Closed Akshay0799 closed 2 years ago

Akshay0799 commented 2 years ago

So I was wondering if it was possible to load any pretrained model(transformers) by hugging face or other libraries inside the Model Annotator function. If yes, can someone tell me how to use them ?

plison commented 2 years ago

We haven't (yet) made a custom labelling function for running fine-tuned HuggingFace models, no, but it should be relatively straightforward to implement I think. The only complication is that the tokens from the HuggingFace tokenizer will typically not correspond to the tokens in the Spacy doc, so you would need to construct a mapping between the two. But that's just a few lines of code.

Note, however, that Spacy already includes a few pretrained models with NER (see for instance en_core_web_trf), in case you are interested in extracting named entities.

Akshay0799 commented 2 years ago

Thank you. So, I've been trying to map the tokens of hugging face with spacy for loading finetuned models for directly predicting entities with spacy and I'm not really sure how to do it. Could you help me with that ?