Closed locals2-j closed 1 year ago
I had someone run into something similar and doing some digging, this issue occurs when trying to resample audio to 48k which I adjusted back to its's default value in https://github.com/JarodMica/rvc-tts-pipeline/commit/93881f6341f9e47daa1af5284215a8ad9a10896c.
This was a bug that was supposed to be fixed in RVC where resampling shouldn't cause issues, but try pulling the repo and installing it again or do pip uninstall rvc_tts_pipe
and then run pip install -e git+https://github.com/JarodMica/rvc-tts-pipeline.git#egg=rvc_tts_pipe
again.
I wasn't able to replicate the original issue but had someone else do it, so try this out and see if it works.
I had someone run into something similar and doing some digging, this issue occurs when trying to resample audio to 48k which I adjusted back to its's default value in 93881f6.
This was a bug that was supposed to be fixed in RVC where resampling shouldn't cause issues, but try pulling the repo and installing it again or do
pip uninstall rvc_tts_pipe
and then runpip install -e git+https://github.com/JarodMica/rvc-tts-pipeline.git#egg=rvc_tts_pipe
again.I wasn't able to replicate the original issue but had someone else do it, so try this out and see if it works.
Yep, seems to be fixed after that.
I have not tried Tortoise with the same RVC's that I have trouble with so I don't know if the issue is universal but using XTTS with certain models outputs a deep voice result, also pitch with XTTS on this pipeline (male->female and vice versa ) is wonky, most times does not work as expected. Tried matching sample rates by converting the 24khz XTTS audio to 16khz (same result with librosa) and that doesn't fix it so I don't think it's an sr issue. I was talking with some people in the XTTS Discord and they were saying it could be the way rvc_convert synthesizes audio because XTTS gives the expected result just fine, its when you shove that result into the pipeline is when the bug happens.
EDIT: For reference the RVC models on AI Hub, the Donald Trump 600/300 epochs works fine but the Joe Rogan 300 epochs does not. Just one example but there also others.