Testing Issues, Any Assistance Would Be Appreciated

JarodMica / rvc-tts-pipeline

TTS pipeline that uses RVC to enhance audio quality and cloning

MIT License

139 stars 30 forks source link

Testing Issues, Any Assistance Would Be Appreciated #4

Closed locals2-j closed 1 year ago

locals2-j commented 1 year ago

I have not tried Tortoise with the same RVC's that I have trouble with so I don't know if the issue is universal but using XTTS with certain models outputs a deep voice result, also pitch with XTTS on this pipeline (male->female and vice versa ) is wonky, most times does not work as expected. Tried matching sample rates by converting the 24khz XTTS audio to 16khz (same result with librosa) and that doesn't fix it so I don't think it's an sr issue. I was talking with some people in the XTTS Discord and they were saying it could be the way rvc_convert synthesizes audio because XTTS gives the expected result just fine, its when you shove that result into the pipeline is when the bug happens.

EDIT: For reference the RVC models on AI Hub, the Donald Trump 600/300 epochs works fine but the Joe Rogan 300 epochs does not. Just one example but there also others.

JarodMica commented 1 year ago

I had someone run into something similar and doing some digging, this issue occurs when trying to resample audio to 48k which I adjusted back to its's default value in https://github.com/JarodMica/rvc-tts-pipeline/commit/93881f6341f9e47daa1af5284215a8ad9a10896c.

This was a bug that was supposed to be fixed in RVC where resampling shouldn't cause issues, but try pulling the repo and installing it again or do pip uninstall rvc_tts_pipe and then run pip install -e git+https://github.com/JarodMica/rvc-tts-pipeline.git#egg=rvc_tts_pipe again.

I wasn't able to replicate the original issue but had someone else do it, so try this out and see if it works.

locals2-j commented 1 year ago

I had someone run into something similar and doing some digging, this issue occurs when trying to resample audio to 48k which I adjusted back to its's default value in 93881f6.

This was a bug that was supposed to be fixed in RVC where resampling shouldn't cause issues, but try pulling the repo and installing it again or do pip uninstall rvc_tts_pipe and then run pip install -e git+https://github.com/JarodMica/rvc-tts-pipeline.git#egg=rvc_tts_pipe again.

I wasn't able to replicate the original issue but had someone else do it, so try this out and see if it works.

Yep, seems to be fixed after that.