CorentinJ / Real-Time-Voice-Cloning

Clone a voice in 5 seconds to generate arbitrary speech in real-time
Other
52.54k stars 8.79k forks source link

Tensorflow v2 compatibility #370

Closed ghost closed 4 years ago

ghost commented 4 years ago

In #364, @CorentinJ wrote:

First things first, the biggest issue for me with this project is the hecking tensorflow code. Tensorflow sucks, and it sucks just as much to install it let alone install an older version.

I believe it would lower the entry barrier for new users if the version of that package were to be upgraded. I've seen a PR for that but that's only for the collab version it seems. A PR for the entire repo would be appreciated.

Ideally, we'd replace all of the synthesizer code with pytorch code (there are several open source pytorch synthesizers out there), but that's a lot of work.

If anybody is willing to pick up on either of these things, let me know.

The collab PR (#338) is an update to tensorflow 1.15.2 I believe. With #366 the rest of the repo also advances to 1.15.

One of the best things about this repo is that it works well out of the box. After set up, one can clone a voice with pretrained models, or replicate the original training procedure by following some simple instructions on the wiki. This needs to be the case no matter what solution is pursued here.

So far I have found these options which seem promising:

  1. Mozilla TTS which has a tacotron-based synthesizer which supports tensorflow v2. Multi-speaker is not yet officially supported but the community has been able to get it work.
  2. NVIDIA has a pytorch-based tacotron2 implementation.

I think the easier approach is to try to support tensorflow v2 by using Mozilla TTS. It also has a larger userbase and better community support. If we switch, will the existing pretrained model continue to work?

ghost commented 4 years ago

I am attempting to convert the existing synthesizer code to tensorflow v2, making some progress but could use some help with this error message. demo_cli.py gets as far as starting the synthesizer test.

Affected code is in custom_decoder.py and helpers.py in synthesizer/models. I have a branch here: https://github.com/blue-fish/Real-Time-Voice-Cloning/commits/370_tf2_compat

[Edit: This issue is resolved. See the history for the original error message]

ghost commented 4 years ago

I figured out a solution to the above issue, to use tf.TensorShape().with_rank() to increase the rank as needed. Now working through a different set of errors.

Edit: Although it makes that one error go away, I do not know if it is the correct fix so I have not committed it.

dathudeptrai commented 4 years ago

@blue-fish hi, can i contribute my Tensorflow2 tacotron2 to this repo ?, our framework also plan to support tflite for TTS model both real-time vocoder and Text2Mel model. Can you take a look our framework ?

github: https://github.com/TensorSpeech/TensorflowTTS sample audio: https://tensorspeech.github.io/TensorflowTTS/ colab demo: https://colab.research.google.com/drive/1akxtrLZHKuMiQup00tzO2olCaN-y3KiD?usp=sharing

we also supported FastSpeech2 which quality comparable with tacotron2 and much faster. Futhermore, we are working to support for other languages such as chinese, JP ...

ghost commented 4 years ago

@dathudeptrai Yes! That's much better than me trying to convert the existing code to Tensorflow2.

dathudeptrai commented 4 years ago

@blue-fish the most important thing is that we need to re-use the pretrained model here so converting the weight to be able to load on my Tacotron2 implementation is the right way :)). My implementation is 90% the same as the tacotron2 implementation here, just need modify some layer and parameter to replicate the model then we can ez to load the pretrained weight here to inference.

ghost commented 4 years ago

@dathudeptrai I am new to TTS and don't have the expertise to make the changes you are describing. Would you be willing to point out exactly what needs to be changed? Or submit a pull request (working or not) to get us started?

DRob81 commented 4 years ago

I figured out a solution to the above issue, to use tf.TensorShape().with_rank() to increase the rank as needed. Now working through a different set of errors.

Edit: Although it makes that one error go away, I do not know if it is the correct fix so I have not committed it.

Can you tell how you fixed 'Shape must be rank 1 but is rank 0' error? I guess it is 'batch_size' in line 98 tacotron.py

ghost commented 4 years ago

@DRob81 Although I solved it once before, it is eluding me this time. I thought I added with_rank() somewhere in synthesizer/models/helpers.py or custom_decoder.py, but it is not working. Thank you for your guess, I inspected batch_size in the debugger for tensorflow v1 and v2 and could not find any difference. I still think it is somewhere in the custom decoder, based on the error message.

DRob81 commented 4 years ago

@blue-fish i also debugged and ended up with batch_size. Glad i found that batch_size = tf.TensorShape(0).with_rank_at_least(1)[0] is the solution here. Still more problems to fix

ghost commented 4 years ago

Thanks for sharing that @DRob81 . If I make that change then the next error message is TypeError: can only concatenate list (not "int") to list at line 218 of synthesizer/models/tacotron.py . It's hard to tell if it's getting further than before, since that is the same line where it errored out before. It is very hard to debug errors with the custom decoder.

DRob81 commented 4 years ago

@blue-fish i fixed that already. Can you tell me which line exactly? I think my line numbers differ from yours now.

ghost commented 4 years ago

I'm unable to get past this part, before or after the last fix: https://github.com/blue-fish/Real-Time-Voice-Cloning/blob/621f62f150f5d0995ce61930479bec9e9043aebe/synthesizer/models/tacotron.py#L212-L218

It might be easier if you fork my repo using these instructions (https://github.com/CorentinJ/Real-Time-Voice-Cloning/issues/401#issuecomment-653929209), that would make it easier to share code updates and discuss issues like these.

DRob81 commented 4 years ago

@blue-fish i forked your repo and i will commit my changes.

HumanG33k commented 4 years ago

hi, keep in mind,

I think it can be good for the record to keep one brach tensorflow 1.X and go for a new master based on tensorflow 2. I just check and there is an "automatic code updater" provide by tsf. I just execute it. I provide the report. In short what y do :

print(tf.version)

2.2.0 in my case (debian testing)
Use the following script in the parent directory of the project directory
```bash
#!/bin/bash
tf_upgrade_v2 \
  --intree Real-Time-Voice-Cloning/ \
  --outtree Real-Time-Voice-Cloning_v2/ \
  --reportfile report.txt

and will have a bunch of ouput and a report.txt

I just not look at the output or content in report.txt

But for check go in v2 directory, edit the requirement.txt and set tensorflow==2.2.0

retry the pip install to be sure :

pip install -r requirements.txt

Everythings look satisfied. No have the time to check more. Feel free to ping me and try to continue that if i have time this week end.

Oh and i just follow tensorflow documentation a least some part https://www.tensorflow.org/guide/migrate https://www.tensorflow.org/guide/upgrade

ghost commented 4 years ago

@HumanG33k I have already performed automatic conversion using that process on the 370_tf2_compat branch of my fork. There are still a bunch of errors that need to be worked through. I have published fixes for some of these, and getting stuck on some others where @DRob81 is also helping. The current errors are runtime so we may be getting close.

If you can run demo_cli.py without errors, please commit those changes to your fork and we can continue developing from there. If not, let's concentrate the effort on my tensorflow2 fork. I am accepting pull requests.

ghost commented 4 years ago

@DRob81 I would like to continue the tensorflow2 effort, can you please commit your changes or submit a pull request to my fork?

javaintheuk commented 4 years ago

Hello, @blue-fish please how can I add another voice on your colab (I'd like to upload or link it to a 5 - 10 seconds wav sample)? thanks!

https://colab.research.google.com/drive/1akxtrLZHKuMiQup00tzO2olCaN-y3KiD

ghost commented 4 years ago

We are not going to pursue tensorflow v2 now that the torch-based synthesizer is working (#472). Thanks to all who contributed their time here.