Open ScottishFold007 opened 1 month ago
Hello!
Thanks for the suggestion. I think it really depends on how elaborate the training approach is. The original code wasn't release, so it's a bit hard to tell. The model also does a few other tricks during training (e.g. false negative filtering, training data clustering) that make the model stronger than otherwise. At the moment I'm thinking that it might not make sense, although some of the components might be useful to implement, e.g. a batch sampler that clusters training data using some SentenceTransformer model (perhaps a StaticEmbedding-based one like tomaarsen/static-bert-uncased-gooaq).
The CDE model is incredibly powerful, as it naturally integrates "context tokens" into the embedding process. As of October 1st, 2024, the cde-small-v1 stands as the top-performing small model (under 400M parameters) on the MTEB leaderboard for text embedding models, boasting an average score of 65.00. Have you considered implementing its training on sentence-transformers? I'm really looking forward to it!!!