Closed ghost closed 3 years ago
KeyBERT allows you to use sentence-transformer models which should be sufficient in most use-cases. A custom embedding technique is a different story however as that is currently not implemented due to differences in API across embedding techniques.
The reason I was asking is that I've started working with the BERT multi-lingual model (https://github.com/google-research/bert/blob/master/multilingual.md) and have been wrestling with an appropriate/efficient way to handle fine-tuning & training. Google indicates that there are no special considerations, but that still leaves the issue of developing a process around training & fine-tuning for the model. Appreciate the feedback and your perspectives on these things.
Byron
A BERT-based solution that did not have to be trained. In my understanding, keyBert doesnt need fine tuning in most cases, and it uses embedding output of BERT to extract keywords. See the same thing in Sharma, P., & Li, Y. (2019). Self-Supervised Contextual Keyword and Keyphrase Retrieval with Self-Labelling.
At some point, I want to add the possibility of using Flair
instead of sentence-transformers
since it allows you to more easily use your own custom embeddings. This might solve some issues, especially concerning custom embeddings.
You can now use both Flair
and SentenceTransformers
in KeyBERT v0.2. This should allow for much more options involving custom embeddings, whether they are 🤗 transformers, Flair, GloVe, or any other model.
What are your thoughts on the training of a selected model to use with KeyBERT?