Use config_sentence_transformers.json in the util functions

NimaBoscarino commented 2 years ago

SentenceTransformer models can have an associated config_sentence_transformers.json file (see this one for example) which contains information about the versions that were used when saving the model. It would be nice to extend the use of this file.

Since the ideal scoring function (dot vs. cos_sim) is known at training time, that could be written into the config_sentence_transformers.json file. Then, some of the util methods (semantic_search, paraphrase_mining, etc.) could read directly from that file to choose the appropriate scoring function.

nreimers commented 2 years ago

I was also thinking about this, but have not yet find a good solution how to infer this from training automatically. Which similarity functions are usable can depend on quite many factors.

But it could be added by hand to the config and if it exist, be used in downstream functions.

NimaBoscarino commented 2 years ago

Adding it by hand and reading it in downstream functions sounds like a great solution to me for now! I can open a PR for that.

UKPLab / sentence-transformers

Use config_sentence_transformers.json in the util functions #1643