RetroCirce / Zero_Shot_Audio_Source_Separation

The official code repo for "Zero-shot Audio Source Separation through Query-based Learning from Weakly-labeled Data", in AAAI 2022
https://arxiv.org/abs/2112.07891
MIT License
184 stars 31 forks source link

I'm having trouble inferring a 15 second song. #6

Closed playdasegunda closed 2 years ago

playdasegunda commented 2 years ago

I was able to use it now with your settings, but I'm facing the following error when playing with a 15 second song, how to fix it? my gpu is RTX 2060 6GB, thanks once again.

Screenshot_1

playdasegunda commented 2 years ago

another question I have,

  1. The more examples in the query folder, the more accurate the final result will be?
  2. How long should I put in each query.wav ? how many seconds would be ideal? (an average duration)

Thank you!

RetroCirce commented 2 years ago

Hi,

If you use RTX 2060 (6GB), the GPU is a liitle bit small to handle the inference, you can try to modify the variable "mini_batch" in line 666 and line 763 at model/asp_model.py to be a smaller number. This will probably make 6GB GPU workable for the inference.

The more examples in the query folder, the more accurate the final result will be?


Not exactly, there are two scenarios: (1) If you find the exactly/almost the same instrument/timbre for your separation, one or two query is enough. For example, you want to separate the guitar from a 1-min song, but different guitars have slightly different timber. If you find there is a 2-sec piece in this 1-min song as guitar solo, I think this guitar solo as the query is enough to achieve a good result, because it is the exactly/almost the same instrument/timbre for your separation

(2) If you cannot find the exact timbre for your separation, the more query you find, you will get better results. But there is still a limitation such as 100 query examples perhaps is a best choice, because it has already touched the limitation.

How long should I put in each query.wav ? how many seconds would be ideal? (an average duration)


Any duration, the code will split them into 2-sec clips and combine them together.