Question in the BERT-based approach.

MaartenGr / PolyFuzz

Fuzzy string matching, grouping, and evaluation.

MIT License

733 stars 67 forks source link

Sorry for the late response! Typically, I would actually not suggest using contextual embeddings for string similarity algorithms. This is, in part, due to what you are saying: The context is missing. Without the context, it becomes more difficult for contextualized embeddings such as BERT to map strings to each other.

Having said that, pre-trained BERT models do have knowledge about many words and can get word embeddings without needing context. Context is merely prefered when using these types of embeddings.

In practice, I am a big fan of pre-trained FastText embeddings and I would personally suggest using those if you are looking for a semantic mapping.

MaartenGr / PolyFuzz

Question in the BERT-based approach. #16