Open Shaqwagon opened 2 years ago
I've tried various lengths and bitrates and normalisation strategies on the input wav file and I have not been able to replicate any decent results yet, performance degrades rapidly over time such that after the 10th second it's just noise and although the first second or so sounds ok, the rest is consistently poor quality, nothing like the demo video. Same results when using the cli demo and my own rehacking of the code.
@Shaqwagon @rlayne Did you guys have any luck since then?
Hi, I was looking through issue #41 and noticed that @KeithYJohnson had noted a problem I'm currently having (in https://github.com/CorentinJ/Real-Time-Voice-Cloning/issues/41#issuecomment-513310617_), and as far as I could tell it hasn't been addressed yet. Basically, I'm taking an excerpt of Christopher Walken from Pulp Fiction, roughly 10-15 seconds in length, that sounds very dissimilar to him for the first few seconds and then sounds like very heavy breathing/incoherent slurring for the majority of the rest.
I'm not well versed in any of the stuff here, I largely followed YouTube tutorials and used the very little I remembered from a C++ class I took two semesters ago to get the toolbox to actually run, so I don't doubt at all that I've done something wrong. I've tried researching it, though, and other than running into people's comments here I haven't found any clues as to what could be wrong, and because of my inexperience I'm not sure where to start. I'm on a Windows 11 laptop, Python 3.9.5, and have CUDA 11.7 drivers (although I'm not actually sure if they're being used here or if it's relying on the CPU). I don't know what else is relevant so if there's anything you'd like to know about what I've done so far feel free to let me know and I'll try to answer as best as possible. Thanks a bunch, greatly appreciate it