Precision Setting Ignored: Returning float64 Instead of float32

huabingli commented 1 month ago

I am using the SentenceTransformer model with the precision parameter set to float32, but the returned precision is still float64. Here is the relevant part of my code:

class GetM3eModel:
    model: SentenceTransformer = None
    device: str = None
    @classmethod
    def get_model(cls) -> SentenceTransformer:
        if cls.model is None:
            cls.model = SentenceTransformer(
                settings.m3e.name_or_path,
                device=cls.get_device(),
                model_kwargs={'torch_dtype': torch.float32}
            )
            logger.info(f"Model loaded from {cls.model}")
        return cls.model
    @classmethod
    def get_device(cls) -> str:
        if cls.device is None:
            cls.device = 'cuda' if torch.cuda.is_available() else 'cpu'
            logger.info(f'SentenceTransformer  {cls.device}')
        return cls.device
encoded_output = GetM3eModel.get_model().encode(
    [article], 
    device=GetM3eModel.get_device(), 
    precision='float32'
)[0].tolist()

Expected Behavior: The encoded output should have a precision of float32. Additional Information: It seems that despite setting torch_dtype to torch.float32 in model_kwargs, and specifying precision='float32' during encoding, the output still has float64 precision. Any insights or suggestions would be greatly appreciated.

Thank you!

huabingli commented 1 month ago

I've encountered a situation where I need to convert the encoded data from a model to a float32 array to reduce the size of the output vector. My current approach is as follows:

data: np.ndarray = GetM3eModel.get_model().encode([article], device=GetM3eModel.get_device(), precision='float32')
data1 = []
for i in data[0]:
    data1.append(float(str(i)))

While this method works, it involves unnecessary string conversion and might not be the most efficient way to achieve the desired result. Is there a more efficient way to ensure that the encoded data remains in float32 format without lengthening the output vector?

Thank you for your assistance!

tomaarsen commented 1 month ago

Hello!

I'm struggling to reproduce this: for me the encode method always returns float32 by default. For reference, precision="float32" also doesn't do anything special behind the scenes, it just keeps the precision from the model (which is float32 normally).

I've used this script to reproduce it:

import logging
import torch
from sentence_transformers import SentenceTransformer

logger = logging.getLogger(__name__)

class GetM3eModel:
    model: SentenceTransformer = None
    device: str = None

    @classmethod
    def get_model(cls) -> SentenceTransformer:
        if cls.model is None:
            cls.model = SentenceTransformer(
                "moka-ai/m3e-base",
                device=cls.get_device(),
                model_kwargs={'torch_dtype': torch.float32}
            )
            logger.info(f"Model loaded from {cls.model}")
        return cls.model

    @classmethod
    def get_device(cls) -> str:
        if cls.device is None:
            cls.device = 'cuda' if torch.cuda.is_available() else 'cpu'
            logger.info(f'SentenceTransformer  {cls.device}')
        return cls.device

article = "The quick brown fox jumps over the lazy dog."
encoded_output = GetM3eModel.get_model().encode(
    [article], 
    device=GetM3eModel.get_device(), 
    precision='float32'
)
print("dtype:", encoded_output.dtype)

which outputs

dtype: float32

With other words, the model itself produces float32 embeddings. If you use tolist() then it all gets converted from a numpy array to pure Python, which only has a notion for float and not float32. I suppose that float is akin to float64, perhaps that is what your problem is? In that case, I would keep the data as a numpy array (or a torch Tensor via convert_to_tensor=True in encode()), as that does have a float32.

Tom Aarsen

huabingli commented 1 month ago

@tomaarsen "Yes, by using print("dtype:", encoded_output.dtype), it is indeed detected as float32. However, I need to return or store the converted vector via an API using FastAPI. I can only execute tolist, or do you have other ways to achieve this?"

tomaarsen commented 1 month ago

Indeed, it makes sense that you can only send "regular floats" over API, but you can just store those back in float32 vectors when you received them. That should be possible without any loss of quality. I don't believe that this should be that big of an issue.

Tom Aarsen

huabingli commented 1 month ago

确实，你只能通过 API 发送“常规浮点数”，但当float32你收到它们时，你可以将它们存储回向量中。这应该是可能的，而且不会有任何质量损失。我不认为这应该是一个大问题。

汤姆·阿森

Yes, this conversion is possible. However, if you store the data in Elasticsearch or other vector databases, it will still be stored as float64. This can lead to excessive resource consumption and may slow down query times.

tomaarsen commented 3 weeks ago

I believe Sentence Transformers outputs the data as float32, so if you need to send it as float64 over API, then perhaps you can re-convert the vectors to float32 once you've received it? I think the problem might be unrelated to Sentence Transformers, as it's really about storing float32 numpy vectors in your e.g. vector database. Perhaps you can specify in your VDB that you want to use a specific precision?

Tom Aarsen

UKPLab / sentence-transformers

Precision Setting Ignored: Returning float64 Instead of float32 #2803