Open garyfeng opened 1 year ago
Note that the webclient version doesn't work. The original code (as of 2022/11/25) exposed 8068:80
in the docker-compose file, whereas the App_RUL=8002
in the config setting. I had changed to 8002:80
but it is still not work.
8002
, maybe I should add 8002:8002
?
Also remember to use 127.0.0.1:8002
from the host's browser. If you do localhost
it is treated as cross-origin.Note that we need to add PaddleSpeech
docker to the project to support audio indexing. For voice specifically, the PaddleSpeech pre-trained model uses the method in https://arxiv.org/pdf/2005.07143.pdf
Milvus supports scalar values in addition to vectors. You can do hybrid searches, see https://milvus.io/docs/v2.2.x/create_collection.md. Makes me wonder why the PaddleSpeech example used a MySQL table to begin with.
Also, note that Milvus' storage layer is fully compatible with S3. It can be deployed using K8S. https://milvus.io/docs/v2.2.x/architecture_overview.md
I have downloaded NYTimes vector data in HDF5 format.
Milvus has a https://milvus.io/docs/h2m.md tool that migrates the HDF5 data Milvus wholesale.
Also, it looks like the PaddleSpeech client docker is copied from https://milvus.io/docs/v2.2.x/audio_similarity_search.md, or see code here https://github.com/milvus-io/bootcamp/tree/master/solutions/audio/audio_similarity_search/quick_deploy, where their audio tagging engine was https://github.com/qiuqiangkong/panns_inference. Paddle folks replaced the engine.
We find the client
docker file. Also look at how they set up the server
docker, which would be out PaddleSpeech
docker.
See also https://github.com/milvus-io/bootcamp/tree/master/solutions/image/face_recognition_system/quick_deploy, where they have embeddings for Celeb to download at https://drive.google.com/file/d/1kWRApLKWveCHsdVH2TCNF2GPKRYw2ZdO/view. The code (https://github.com/milvus-io/bootcamp/blob/master/solutions/image/face_recognition_system/quick_deploy/server/src/search_face.py) is smart enough to skip the feature extraction if this file is present.
Follow the PaddleSpeech audio search example.