Open huanggefan opened 9 months ago
This is because there is no unambiguous way of converting unicode to char * in the C++ code.
oh, how to solve this problem, anyone have idea?
I also encountered the same issue. While it's not a fundamental solution, I resolved it by saving the index file to a temporary path and then copying the file. Below is the code example.
import os
import shutil
import tempfile
import faiss
import numpy as np
from pathlib import Path
from uuid import uuid4
def get_temp_dir():
# windows
if os.name == "nt":
return "/Temp"
# linux, macos
return "/tmp"
features = [
[0, 0, 0, 0, 0],
[1, 0, 0, 1, 0],
[1, 1, 0, 0, 1],
[0, 1, 0, 0, 1],
[1, 1, 0, 1, 1],
[1, 0, 0, 1, 1],
]
d = len(features[0]) # dimension
index = faiss.IndexFlatL2(d)
for ft in features:
parsed = np.array([ft], dtype=np.float32)
index.add(parsed)
dest_path = "/path/to/save/faiss.idx"
temp_dir = get_temp_dir()
if not Path(temp_dir).is_dir():
Path(temp_dir).mkdir()
with tempfile.TemporaryDirectory(dir=temp_dir) as p:
temp_file_path = Path(p) / str(uuid4())
faiss.write_index(index, str(temp_file_path))
shutil.move(str(temp_file_path), dest_path)
Since the OS user name can be included in the default temp directory, I specified a separate temp_dir. If the user name contains Unicode, the same problem can occur. If it is guaranteed that the user name does not include Unicode, the attribute dir
can be omitted in tempfile.TemporaryDirectory
.
Okay, that's a workaround for write_index but what do you do for read_index? If I understand this issue correctly, this problem also occurs for read_index so that you should encounter this problem as well when you want to read from the (now moved to the correct path) index file.
同样遇到了这个问题,我的业务场景必须使用到中文路径,请问有人解决了吗
Okay, that's a workaround for write_index but what do you do for read_index? If I understand this issue correctly, this problem also occurs for read_index so that you should encounter this problem as well when you want to read from the (now moved to the correct path) index file.
Reading or writing an index is the same. Copy the index to be read to a temporary path with a filename that does not contain Unicode characters, then read the file using faiss.read_index.
Summary
If the path contains Unicode characters, can not read_index and write_index
Platform
OS: Windows 11
Python: Python 3.11.4
Faiss version: 1.7.4
Installed from: pip install faiss-cpu
Running on:
Interface:
Reproduction instructions
here is code:
When performing faiss.read_index and faiss.write_index operations, if the path contains Unicode characters, you may encounter the following error: