Open on1you opened 2 years ago
As long as the word length is the same, the encoding is the same
from sentence_transformers import SentenceTransformer,util embedder = SentenceTransformer('msmarco-distilbert-base-v4') corpus_embeddings = embedder.encode(['婚礼','菜单','招聘','邀请'], convert_to_tensor=True) encode('婚礼')=encode('菜单')=[-1.16728060e-01 1.81547254e-01 -1.05594993e-02 -4.06406701e-01.....]
from sentence_transformers import SentenceTransformer,util
embedder = SentenceTransformer('msmarco-distilbert-base-v4')
corpus_embeddings = embedder.encode(['婚礼','菜单','招聘','邀请'], convert_to_tensor=True)
That model only works for English
thank you very much, Which models support Chinese? I only found five Multi-Lingual Models
MS MARCO is a large scale information retrieval corpus that was created based on real user search queries using Bing search engine
the real user not contain Chinese?
As long as the word length is the same, the encoding is the same
from sentence_transformers import SentenceTransformer,util
embedder = SentenceTransformer('msmarco-distilbert-base-v4')
corpus_embeddings = embedder.encode(['婚礼','菜单','招聘','邀请'], convert_to_tensor=True)
encode('婚礼')=encode('菜单')=[-1.16728060e-01 1.81547254e-01 -1.05594993e-02 -4.06406701e-01.....]