SeanLee97 / AnglE

Train and Infer Powerful Sentence Embeddings with AnglE | 🔥 SOTA on STS and MTEB Leaderboard
https://arxiv.org/abs/2309.12871
MIT License
397 stars 30 forks source link

Difference in output when running via Trasformers.js and when hosting on Huggingface #46

Open jtmuller5 opened 4 months ago

jtmuller5 commented 4 months ago

I created an application that uses the UAE-large-V1 model inside Transformers.js and was able to embed sentences in a browser without issues. The model would return a single vector for a single input:

extractor = await pipeline("feature-extraction", "WhereIsAI/UAE-Large-V1", {
      quantized: true,
});

let result = await extractor(text, { pooling: "mean", normalize: true });

When I hosted the model on Huggingface using their inference endpoint solution, it no longer works as expected. Instead of returning a single vector, it returns a variable length of 1024 dimension vectors.

Sample input:

{
   "inputs":  "Where are you"
}

This returns a list of lists of lists of numbers.

Is there a way to make hosted model return a single vector? And why does the the model act differently based on where it's hosted?

SeanLee97 commented 3 months ago

It is strange. It should return a single vector because you have specified the mean pooling.

You could ask for help in the Transformers.js project because I am unfamiliar with it. Sorry for this.