CorentinJ / Real-Time-Voice-Cloning

Clone a voice in 5 seconds to generate arbitrary speech in real-time
Other
52.16k stars 8.73k forks source link

Anyone willing to pick this up? #332

Closed nmcbride closed 4 years ago

nmcbride commented 4 years ago

It's always sad when a really cool open source project gets abandoned to go commercial. Is there anyone else who is willing to pick this up and keep it going?

castdrian commented 4 years ago

I believe this can be closed as @pusalieth seems to be working on their fork

Dont-Copy-That-Floppy commented 4 years ago

I'm going to be throwing some pretty heavy time into this, so up to you guys. Hopefully in the end it'll all get merged upstream.

castdrian commented 4 years ago

Idc about the main repo, you're the one who's putting his time into it, so your fork is the one I'll be using

On Thu., Apr. 30, 2020, 02:17 pusalieth, notifications@github.com wrote:

I'm going to be throwing some pretty heavy time into this, so up to you guys. Hopefully in the end it'll all get merged upstream.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/CorentinJ/Real-Time-Voice-Cloning/issues/332#issuecomment-621537450, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFI3T7S7IY6ZGW4WP7GEDUDRPC7SRANCNFSM4MT2NAXA .

jardayn commented 4 years ago

@pusalieth gonna leave this here: https://github.com/pusalieth/Real-Time-Voice-Cloning/commit/b333e733f516a7e13bccf7d1623ab35439fc9aa5

mp3's work just fine on Linux, without that change.

Dont-Copy-That-Floppy commented 4 years ago

@jardayn Which commands are you using to run the program? If you can get versions, that would be icing on the cake.

jardayn commented 4 years ago

@pusalieth python demo_cli.py

Versions (tell me if i missed anything)

Ubuntu 18.04 Python 3.6 Latest Nvidia CUDA Nvidia 440 drivers All the versions from requirements.txt (it's missing torch) Torch - latest one IntelliJ installed.

nmcbride commented 4 years ago

@pusalieth I don't have a full understanding of all this stuff yet but I'm willing to help where I can. If you take this and work on it, we can just work out of your fork. There are already good solutions for video that are maintained, we need something in the audio world that is also maintained.

Dont-Copy-That-Floppy commented 4 years ago

@jardayn You might be using a version where I already fixed the issue with path. The demo_cli.py is the script I used/tested with.

jardayn commented 4 years ago

@pusalieth I was using the original version with none of your changes present.

Dont-Copy-That-Floppy commented 4 years ago

@jardayn What command did you run? What OS are you using? And what is the path of the file you input?

jardayn commented 4 years ago

@pusalieth python demo_cli.py

Versions (tell me if i missed anything)

Ubuntu 18.04 Python 3.6 Latest Nvidia CUDA Nvidia 440 drivers All the versions from requirements.txt (it's missing torch) Torch - latest one IntelliJ installed. (latest I guess)

mp3's were in the root directory

CorentinJ commented 4 years ago

I'm sure this makes no difference to you, but I want to make a note that this project was my thesis and nothing more. Making it open-source was one of the goals, but beyond a working prototype there were no real plans of maintaining it long-term.

While I cannot share all the differences and improvements from our implementation at Resemble.AI and this one, I can definitely shed light on what is worth rewriting for this project:

Dont-Copy-That-Floppy commented 4 years ago

@jardayn Don't know for sure, but I would guess because the file was in your root dir. Could be many other things though. Either way, it's works now for sure.

I can tell you I'd only want to install pytorch from conda. There's so many dependencies, and conda already has the work done.

Dont-Copy-That-Floppy commented 4 years ago

@CorentinJ Thanks for your input. People may not have noticed on your readme that you didn't plan on maintaining it pass Sep 2019. I've only gotten mixed up into ML for like 2 weeks now. Eventually I'll probably go for a Master's in it, who knows. I've got such little experience in it though, your advice is way out of my scope. I'm just coding to use the foundation you built, and make it work to it's maximal capacity.

Just out of curiosity, It's looks like you worked on it for a year or so past your thesis. Did you get hired, or co-found?

jardayn commented 4 years ago

@pusalieth also why Conda? Lots of people are using normal python venvs.

@CorentinJ thanks for the info. Yeah, there are loads of way of improving this.

CorentinJ commented 4 years ago

I finished my thesis around June last year and got offers as soon as I made the repo public. I started working immediately after that. I did expect to maintain the project a little more than that, but it was without accounting for the fact that I would work in a very similar vein, and thus having to keep the advancements for myself.

castdrian commented 4 years ago

Well it's kinda sad to see this go behind a paywall, I myself am trying to use this for my open source Discord Bot, so it's really sad to see this paywalled behind resemble.ai, a free open source API never hurt anyone

On Sun., May 3, 2020, 20:35 Corentin Jemine, notifications@github.com wrote:

I finished my thesis around June last year and got offers as soon as I made the repo public. I started working immediately after that. I did expect to maintain the project a little more than that, but it was without accounting for the fact that I would work in a very similar vein, and thus having to keep the advancements for myself.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/CorentinJ/Real-Time-Voice-Cloning/issues/332#issuecomment-623159087, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFI3T7WWKIFOMYC3RMKBRNDRPW2QDANCNFSM4MT2NAXA .

jardayn commented 4 years ago

@adrifcastr Got a link to the bot?

Also Open Source API's hurt the income of people. Not surprised that the best stuff is commercial.

castdrian commented 4 years ago

@adrifcastr Got a link to the bot?

Also Open Source API's hurt the income of people. Not surprised that the best stuff is commercial.

here you go, and well, I'm also not making money of my API, I just provide it.

jardayn commented 4 years ago

But for what Jemine is working, there are commercial applications, so... yeah.

I mean, if you want Voice Gen, there's Mozilla TTS

castdrian commented 4 years ago

well TTS and voice cloning aren't exactly the same, I don't need a TTS service, I need to replicate a voice, that's all

On Mon., May 4, 2020, 00:35 jardayn, notifications@github.com wrote:

But for what Jemine is working, there are commercial applications, so... yeah.

I mean, if you want Voice Gen, there's Mozilla TTS

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/CorentinJ/Real-Time-Voice-Cloning/issues/332#issuecomment-623193837, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFI3T7QVDNQ5SHL6ODDAALDRPXWUBANCNFSM4MT2NAXA .

Dont-Copy-That-Floppy commented 4 years ago

@jardayn

adrifcastr Got a link to the bot?

Also Open Source API's hurt the income of people. Not surprised that the best stuff is commercial.

I don't want to get into the tit for tat, which is where I think this thread seems to be going. If people want to release open source, it's their choice. Or they can monetize it, and if it's valuable enough to people who can't do it themselves, they'll buy it and that's literally the definition of commerce. Doesn't matter either way to me, but I do prefer open source, and here's why.

Linux, Apache, SQL, and php were/are the backbone of the internet, and all 4 are open source. All major corporation servers are open source. SSL is open source. Google runs 99% open source, including their products, like Android, YouTube, etc. Facebook almost completely runs on open source. Literally the richest software companies in the world run on open source. So, wealth generation and source type are not inextricably linked. There's a few exceptions, but the majority of wealthy corporations run the majority of their software using open source.

Open source benefits the maximal amount of people with the least of amount of money. That's why I would choose open source over closed 80% of the time. The only trouble is certain pieces of software, it's extremely hard to monetize, so they use walled gardens instead. When it comes to AI, I'm in full support of OpenAI objectives. This is the prime time to make everything open source, and sell the models, or usage. That's just my opinion.

CorentinJ commented 4 years ago

Lads if you want to ask Resemble.AI to make the project open source, you go ahead and do it. If you expect a student who just finished university to work full-time on his own and for free on an open source project, you've probably never put a foot in the real world. The closest to what you're asking is Mozilla's repo.

well TTS and voice cloning aren't exactly the same, I don't need a TTS service, I need to replicate a voice, that's all

Yeah they are the same, and you'll get all the features in this repo from Mozzila's repo. Last I checked, erogol had a lot of features from different papers implemented, including sv2tts.

jardayn commented 4 years ago

@pusalieth I should've rephrased that the Open Source comments were in relation to AI.

essentially what @CorentinJ said:

If you expect a student who just finished university to work full-time on his own and for free on an open source project, you've probably never put a foot in the real world

erogol commented 4 years ago

Lads if you want to ask Resemble.AI to make the project open source, you go ahead and do it. If you expect a student who just finished university to work full-time on his own and for free on an open source project, you've probably never put a foot in the real world. The closest to what you're asking is Mozilla's repo.

well TTS and voice cloning aren't exactly the same, I don't need a TTS service, I need to replicate a voice, that's all

Yeah they are the same, and you'll get all the features in this repo from Mozzila's repo. Last I checked, erogol had a lot of features from different papers implemented, including sv2tts. In fact he's even copied some code from my repo.

Thanks for refering to Mozilla TTS. However, I should emphasize that I did not "copy" anything from here. And yet I' I'd be happy to cite your code as you implemented it before.

CorentinJ commented 4 years ago

Yeah I should have worded it better. Sorry about that, I will correct it.

ghost commented 4 years ago

I believe this can be closed as @CorentinJ has outlined a vision for continued development, and is allowing the community to provide contributions to the repo. See #364