LordBeardsteak commented 3 years ago

When running demo.ipynb, I am presented with this error:

AttributeError Traceback (most recent call last)
in () 10 os.makedirs('results') 11 ---> 12 model = build_model().to(device) 13 checkpoint = torch.load("/content/SpeechSplit/checkpoint_step001000000_ema.pth") 14 model.load_state_dict(checkpoint["state_dict"]) /root/wavenet_vocoder/autovc/synthesis.py in build_model() AttributeError: 'HParams' object has no attribute 'builder'

This has been mentioned before in #1, but there have been no solutions posted. This repo has very few instructions. The ones that exist are vague and lack detail. It would be helpful to have a more comprehensive installation tutorial.

auspicious3000 commented 3 years ago

The posted solution is to use the vocoder from AutoVC. They share the same vocoder and thus not included in this repo.

LordBeardsteak commented 3 years ago

It is not clear what files must be extracted from AutoVC to speechsplit-master/assets. The readme, your comment above and your solution to #1 mention using 'the vocoder from autovc'. There are many files in AutoVC that fit that description.

For instance, AutoVC contains the files:

vocoder.ipynb model_vc model_bl

And the AutoVC README links three 'models':

checkpoint_step001000000_ema.pth autovc.ckpt 3000000-BL.ckpt

I have tried running demo.ipynb with different combinations of these files moved to speechsplit-master/assets. I have even tried moving the entirety of the AutoVC contents into speechsplit-master/assets. I ran vocoder.ipynb in the AutoVC-master folder and then tried running the demo with vocoder-checkpoint.ipynb placed in different folders (speechsplit-master, speechsplit-master/assets, speechsplit-master/.ipynb_checkpoints). All have presented me with the same AttributeError.

Could you please be clearer about what files must be placed where, or at least post a screenshot of the speechsplit-master and speechsplit-master/assets directories with the required files placed in the appropriate folders?

auspicious3000 commented 3 years ago

First of all, you need to install the appropriate version of r9y9's Wavenet vocoder, which is a large and delicate repo by itself. We did not include it in our repo in order to make our repo simple and clear.

In our project, Vocoder is the part that converts spectrogram to audio. There are not many files in AutoVC fitting that description.

demo.ipynb has two cells, the first cell does not require the Wavenet vocoder, only the second cell requires the Wavenet vocoder. The second cell of the demo.ipynb is the same as the vocoder.ipynb in AutoVC. The second cell of the demo.ipynb imports from synthesis, so you need the synthesis from AutoVC. The synthesis requires its own hparams, so you need the corresponding hparams from AutoVC. Do not use the hparams for speechsplit. The second cell of the demo.ipynb loads checkpoint_step001000000_ema.pth, so you need checkpoint_step001000000_ema.pth.

LordBeardsteak commented 3 years ago

Thank you for the more detailed explanation, however I followed your instructions with a clean install and have received new errors including:

demo conversion:

RuntimeError ‘Could not infer dtype of NoneType’

and

spectrogram to waveform:

ImportError: cannot import name ‘builder’ from ‘wavenet_vocoder’

I am sorry to be pedantic on the phrasing of these steps, but there seem to be instructions necessary to run the demo that are missing from the README and scattered amongst solved issues in this repo. I would like to help make the README clearer by including the missing steps and clarify the ones that may be difficult to interpret. Again, a screenshot of the directories would be extremely helpful and would provide context to your steps. I am happy to move this to a separate issue if necessary.

Here are the steps you’ve written in the speechsplit-master README and what I have inferred from them:

“Download pre-trained models to assets”:

Download and move the pre-trained models '640000-P.ckpt' and '660000-G.ckpt' to ‘assets’

“Download the same WaveNet vocoder model as in AutoVC to assets” I already ran ‘pip install wavenet_vocoder==0.1.1’ so I assume this means:

Download and move the ‘checkpoint_step001000000_ema.pth’ to ‘assets’ (because this file is described as ‘WaveNet Vocoder pre-trained model’ in AutoVC).

In issue #1, you said

Please run the vocoder part in the vocoder folder, and place the vocoder checkpoint in an accessible folder.

Does this mean downloading AutoVC, running 'conversion.ipynb' and 'vocoder.ipynb', and then moving the file ‘vocoder-checkpoint’ into speechsplit-master/assets? Or does this mean moving ‘checkpoint_step001000000_ema.pth’? Both of these files include the word ‘checkpoint’ and are related directly to something described ‘vocoder’, which is why it’s confusing.

Those are the only two steps stated in the README before the step that says “Run demo.ipynb”. I can tell I have misunderstood step 2, though there are no steps that mention ‘synthesis.py’ or ‘hparams’ in the README either. So to clarify the other steps:

Copy ‘synthesis.py’ from AutoVC into 'speechsplit-master'

Replace ‘hparams.py’ in speechsplit-master with ‘hparams.py’ from the AutoVC repo.

Thank you for taking the time to answer. I understand that it is difficult to troubleshoot this software for inexperienced users.

leijue222 commented 3 years ago

Just from autovc clone hamarm.py and synthesis.py is OK.
Then rename hamarm.py to autovc_hamarm.py, and change synthesis.py content of: from hparams import hparams to from autovc_hparams import hparams

OSSome01 commented 3 years ago

I follwed both the steps mentioned by @leijue222:

I copied the synthesis.py from the autovc repo, and
I copied the hparams.py from autovc as autovc_hparams.py. Also I changed the import statement in the synthesis.py file

I am still receiving the same error. Pls help. Thanks in advance.

auspicious3000 / SpeechSplit

AttributeError: 'HParams' object has no attribute 'builder' #32

demo conversion:

spectrogram to waveform:

Here are the steps you’ve written in the speechsplit-master README and what I have inferred from them: