RVC-Project / Retrieval-based-Voice-Conversion-WebUI

Easily train a good VC model with voice data <= 10 mins!
MIT License
22.44k stars 3.39k forks source link

Accelerating Faiss retrieval using FastScan in Faiss #27

Open nadare881 opened 1 year ago

nadare881 commented 1 year ago

Thank you for the amazing software. I am particularly interested in the interesting applications of vector search. I am still in the process of setting up, but I plan to try running it soon.

While reading the source code, I noticed a point of concern in the faiss part and created an issue.

Currently, IVF512 is used in retrieval. While I think this is simple and effective as a baseline on the GPU, I believe there are better index factory options when running on the CPU. https://github.com/liujing04/Retrieval-based-Voice-Conversion-WebUI/blob/6c7c1d933ffe2217edc74afadff7eec0078d6d16/infer/train-index.py#L19

This can be done using the FastScan method, by simply changing the index factory from "IVF512,Flat" to "IVF512PQ128x4fsr,Rflat" (512 is the original IVF's parameter, PQ128 indicates half of 256 dimention).

Since I haven't been able to run RVC yet, I'm not sure if this parameter is effective, but in most cases, it works effectively on both the CPU and GPU. Once I run it and find it effective, I will report back in this issue.

fumiama commented 1 year ago

I would like to see the acceleration if this modification doesn't change the behavior of index-training.

RVC-Boss commented 1 year ago

Is there a benchmark to compare the two index_factory strings? Speed and recall.

nadare881 commented 1 year ago

@liujing04

Is there a benchmark to compare the two index_factory strings? Speed and recall.

This is official benchmark https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors#4-bit-pq-comparison-with-scann

and this is famous benchmark http://ann-benchmarks.com/

While ScaNN outperforms other methods in cost performance for approximate nearest neighbor search in embeddings, FastScan demonstrates equivalent or even better performance compared to ScaNN.

nadare881 commented 1 year ago

Part of the code was wrong. very sorry. I experimented with three types of indexes on the GUI.

I measured the time to convert to 2min audio data. big_npy shape used in the experiment is (535514, 256).

  1. default added_IVF13731_Flat_nprobe_17.index was created and it takes 2.2 sec to convert.

  2. BruteForce (for comparision)

index = faiss.index_factory(256, "Flat")
index.train(big_npy)
faiss.write_index(index, 'trained_Flat_src_feat.index')
index.add(big_npy)
faiss.write_index(index,"added_Flat_src_feat.index")

added_Flat_src_feat.index was created and it takes 5.4 sec to convert.

  1. FastScan
    index = faiss.index_factory(256, "IVF512,PQ128x4fs,RFlat")
    index.train(big_npy)
    faiss.write_index(index, 'trained_IVF512_fastscan_src_feat.index')
    index.add(big_npy)
    faiss.write_index(index,"added_IVF512_fastscan_src_feat.index")

    added_IVF512_fastscan_src_feat.index was created and it takes 2.0 sec to convert.

Although it is slight, performance improvement was seen compared to the default setting.

fumiama commented 1 year ago

The improvement seems great. Maybe you can open a draft PR to show your modification and I will test it later.

nadare881 commented 1 year ago

I created a draft PR. Acceleration using FastScan is effective only on CPUs that can handle some register instructions, so it may not be effective in environments such as colab. I will continue to investigate and experiment with optimal parameters and approximate neighborhood search methods in each environment.

nadare881 commented 1 year ago

@RVC-Boss @fumiama Now that the code has been cleaned up, I think it's time to improve search with faiss.

I would like to create a PR that commits in the following order. Please merge up to the necessary functions.

  1. Extract train_index into a function and share it
  2. Change default index factory to parameters recommended by faiss
  3. Create a parameter that takes the weighted average of top-k inferences from top-1 inferences.
  4. Enable index compression by kmeans.
RVC-Boss commented 1 year ago

@RVC-Boss @fumiama Now that the code has been cleaned up, I think it's time to improve search with faiss.

I would like to create a PR that commits in the following order. Please merge up to the necessary functions.

  1. Extract train_index into a function and share it
  2. Change default index factory to parameters recommended by faiss
  3. Create a parameter that takes the weighted average of top-k inferences from top-1 inferences.
  4. Enable index compression by kmeans.

@nadare881

Create a parameter that takes the weighted average of top-k inferences from top-1 inferences:

----Does this option always achieve better results? Perhaps this option should be frozen (always open it) and not open to users?

Change default index factory to parameters recommended by faiss:

----Waiting for your latest report and recommended index string~

nadare881 commented 1 year ago

@RVC-Boss

Does this option always achieve better results? Perhaps this option should be frozen (always open it) and not open to users?

The option to get top-k smoothes the audio and is better in some cases.

Waiting for your latest report and recommended index string~

I have created a doc on parameter tuning for faiss. Please refer here.

https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/blob/b05487b6cacc17578541f8b5096bd2424ff3b0dd/docs/faiss_tips_en.md

nadare881 commented 1 year ago

Part of faiss update was reflected, and when I tried real-time conversion, the difference in index didn't give much speed improvement in my environment. I will lower the priority of this issue and raise the priority of document update