garyfeng / embedding_vector_search

prototyping global search for embedding vectors
Apache License 2.0
0 stars 0 forks source link

FEA: Add Milvus example #1

Open garyfeng opened 1 year ago

garyfeng commented 1 year ago

Follow the PaddleSpeech audio search example.

garyfeng commented 1 year ago

Note that the webclient version doesn't work. The original code (as of 2022/11/25) exposed 8068:80 in the docker-compose file, whereas the App_RUL=8002 in the config setting. I had changed to 8002:80 but it is still not work.

garyfeng commented 1 year ago

Note that we need to add PaddleSpeech docker to the project to support audio indexing. For voice specifically, the PaddleSpeech pre-trained model uses the method in https://arxiv.org/pdf/2005.07143.pdf

garyfeng commented 1 year ago

Milvus supports scalar values in addition to vectors. You can do hybrid searches, see https://milvus.io/docs/v2.2.x/create_collection.md. Makes me wonder why the PaddleSpeech example used a MySQL table to begin with.

Also, note that Milvus' storage layer is fully compatible with S3. It can be deployed using K8S. https://milvus.io/docs/v2.2.x/architecture_overview.md

garyfeng commented 1 year ago

I have downloaded NYTimes vector data in HDF5 format.

Milvus has a https://milvus.io/docs/h2m.md tool that migrates the HDF5 data Milvus wholesale.

garyfeng commented 1 year ago

Also, it looks like the PaddleSpeech client docker is copied from https://milvus.io/docs/v2.2.x/audio_similarity_search.md, or see code here https://github.com/milvus-io/bootcamp/tree/master/solutions/audio/audio_similarity_search/quick_deploy, where their audio tagging engine was https://github.com/qiuqiangkong/panns_inference. Paddle folks replaced the engine.

We find the client docker file. Also look at how they set up the server docker, which would be out PaddleSpeech docker.

garyfeng commented 1 year ago

See also https://github.com/milvus-io/bootcamp/tree/master/solutions/image/face_recognition_system/quick_deploy, where they have embeddings for Celeb to download at https://drive.google.com/file/d/1kWRApLKWveCHsdVH2TCNF2GPKRYw2ZdO/view. The code (https://github.com/milvus-io/bootcamp/blob/master/solutions/image/face_recognition_system/quick_deploy/server/src/search_face.py) is smart enough to skip the feature extraction if this file is present.