MaartenGr / KeyBERT

Minimal keyword extraction with BERT
https://MaartenGr.github.io/KeyBERT/
MIT License
3.5k stars 345 forks source link

What about training? #19

Closed ghost closed 3 years ago

ghost commented 3 years ago

What are your thoughts on the training of a selected model to use with KeyBERT?

MaartenGr commented 3 years ago

KeyBERT allows you to use sentence-transformer models which should be sufficient in most use-cases. A custom embedding technique is a different story however as that is currently not implemented due to differences in API across embedding techniques.

ghost commented 3 years ago

The reason I was asking is that I've started working with the BERT multi-lingual model (https://github.com/google-research/bert/blob/master/multilingual.md) and have been wrestling with an appropriate/efficient way to handle fine-tuning & training. Google indicates that there are no special considerations, but that still leaves the issue of developing a process around training & fine-tuning for the model. Appreciate the feedback and your perspectives on these things.

Byron

haofengsiji commented 3 years ago

A BERT-based solution that did not have to be trained. In my understanding, keyBert doesnt need fine tuning in most cases, and it uses embedding output of BERT to extract keywords. See the same thing in Sharma, P., & Li, Y. (2019). Self-Supervised Contextual Keyword and Keyphrase Retrieval with Self-Labelling.

MaartenGr commented 3 years ago

At some point, I want to add the possibility of using Flair instead of sentence-transformers since it allows you to more easily use your own custom embeddings. This might solve some issues, especially concerning custom embeddings.

MaartenGr commented 3 years ago

You can now use both Flair and SentenceTransformers in KeyBERT v0.2. This should allow for much more options involving custom embeddings, whether they are 🤗 transformers, Flair, GloVe, or any other model.