Open wangyan828 opened 4 months ago
1.我利用CN-CLIPViT-B/16模型向量化几张狗的图片,并将向量化的结果加到Groma向量数据库中;之后我在搜索匹配时输入特殊符号!@#¥时却会搜索到狗的图片?为什么与向量化图片无关的输入会搜到结果呢?
if name == "main": img_sess_options = onnxruntime.SessionOptions() img_run_options = onnxruntime.RunOptions() img_run_options.log_severity_level = 2
img_onnx_model_path="/usr/share/kylin-datamanagement-models/cn-clip-onnx/vit-b-16.img.fp32.onnx" img_session = onnxruntime.InferenceSession(img_onnx_model_path, sess_options=img_sess_options, providers=["CPUExecutionProvider"]) model_arch = "ViT-B-16" preprocess = image_transform(_MODEL_INFO[model_arch]['input_resolution']) image_path = "/home/wangyan/wangyan/ziliao/test-search/test/dog.jpeg" image = preprocess(Image.open(image_path)).unsqueeze(0) # print("get image shape of:", image.shape) # 用ONNX模型计算图像侧特征 image_features = img_session.run(["unnorm_image_features"], {"image": image.cpu().numpy()})[0] # 未归一化的图像特征 image_features = torch.tensor(image_features) # print(image_features.norm(dim=-1, keepdim=True)) image_features /= image_features.norm(dim=-1, keepdim=True) # 归一化后的Chinese-CLIP图像特征,用于下游任务 # 创建数据库 添加数据 embedded_as_lists = [] for array in image_features: embedded_list = [float(elem) for elem in array.flatten()] embedded_as_lists.append(embedded_list) chroma_client = chromadb.PersistentClient(path = "/home/wangyan/文档/database") collection = chroma_client.get_or_create_collection(name="usermanual", metadata={"hnsw:space": "cosine"}) # uuids = [str(uuid.uuid4()) for _ in embedded_as_lists] # data = collection.add( # ids = uuids, # embeddings=embedded_as_lists # ) # 载入ONNX文本侧模型(**请替换${DATAPATH}为实际的路径**) txt_sess_options = onnxruntime.SessionOptions() txt_run_options = onnxruntime.RunOptions() txt_run_options.log_severity_level = 2 txt_onnx_model_path="/usr/share/kylin-datamanagement-models/cn-clip-onnx/vit-b-16.txt.fp32.onnx" txt_session = onnxruntime.InferenceSession(txt_onnx_model_path, sess_options=txt_sess_options, providers=["CPUExecutionProvider"]) # 为4条输入文本进行分词。序列长度指定为52,需要和转换ONNX模型时保持一致(参见转换时的context-length参数) text = clip.tokenize(["!@#¥"], context_length=52) print("tokens:", text) text_features = [] for i in range(len(text)): one_text = np.expand_dims(text[i].cpu().numpy(),axis=0) text_feature = txt_session.run(["unnorm_text_features"], {"text":one_text})[0] # 未归一化的文本特征 # print(text_feature) text_feature = torch.tensor(text_feature) text_features.append(text_feature) text_features = torch.squeeze(torch.stack(text_features),dim=1) # 4个特征向量stack到一起 text_features = text_features / text_features.norm(dim=1, keepdim=True) # 归一化后的Chinese-CLIP文本特征,用于下游任务 embedded_as = [] for array in text_features: embedded_list = [float(elem) for elem in array.flatten()] embedded_as.append(embedded_list) data = collection.query( query_embeddings=embedded_as, n_results=10 ) distances = data.get('distances')[0] results = [1 - d if isinstance(d, (int, float)) else None for d in distances] print(results)`
这已经是困难问题了,卡的相似度阈值太高,语义文本无法召回图片,太低,一些简单无意义文本又会召回图片
1.我利用CN-CLIPViT-B/16模型向量化几张狗的图片,并将向量化的结果加到Groma向量数据库中;之后我在搜索匹配时输入特殊符号!@#¥时却会搜索到狗的图片?为什么与向量化图片无关的输入会搜到结果呢?
if name == "main": img_sess_options = onnxruntime.SessionOptions() img_run_options = onnxruntime.RunOptions() img_run_options.log_severity_level = 2
用fp16的模型会报警告