cansik / onnxruntime-silicon

ONNX Runtime prebuilt wheels for Apple Silicon (M1 / M2 / M3 / ARM64)
MIT License
190 stars 18 forks source link

Question, how to use embedding using mps ? #19

Open x4080 opened 3 months ago

x4080 commented 3 months ago

Here's the embedding code :

from optimum.onnxruntime import ORTModelForFeatureExtraction
from transformers import AutoModel, AutoTokenizer
import numpy as np

model_ort = ORTModelForFeatureExtraction.from_pretrained('BAAI/bge-small-en-v1.5', file_name="onnx/model.onnx")
tokenizer = AutoTokenizer.from_pretrained('BAAI/bge-small-en-v1.5')
model = AutoModel.from_pretrained('BAAI/bge-small-en-v1.5')
...
inputs = tokenizer(documents, padding=True, truncation=True, return_tensors='pt', max_length=512)
embeddings = model(**inputs)[0][:, 0].detach().numpy()

It works but only using cpu, when I tried using to("mps"), it wont work

How can I use mps for this scenario ?

Thanks

henryruhs commented 2 months ago

Use the official onnxruntime, this repo is outdated and can be archived.

x4080 commented 2 months ago

@henryruhs I see, thanks