auspicious3000 / autovc

AutoVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss
https://arxiv.org/abs/1905.05879
MIT License
978 stars 207 forks source link

What is the format of the metadata? #17

Open 1015720437 opened 4 years ago

1015720437 commented 4 years ago

What is the format of the metadata? I want to try another audio. I checked the data inside. But I don't know what the second one is. The first one is the name,The third is mel-spectrogram.

And does this apply to Chinese audio? Or I need to retrain the model and use Chinese data. thanks!

auspicious3000 commented 4 years ago

For Chinese audio, you need to retrain the model and retune the hyper params.

mhosein4 commented 4 years ago

What is the difference between train and test metadata? I create metadata from persian waves, but its format is not like yours. I can train the network, but I can't test it. The third section of my metadata is path of .npy files, that created by make_spect.py please help me, sorry I'm confused Thanks a lot.

auspicious3000 commented 4 years ago

The metadata is all different depending on the use case. It is nothing but some sort of nested list. You can easily make your own by looking into one of these metadata.

mhosein4 commented 4 years ago

Thank you for your explanation. I can't understand what is the third section and how to generate it? What is array that's highlight in picture?

image

Thanks for support

auspicious3000 commented 4 years ago

Can you print the shape of it?

mhosein4 commented 4 years ago

Yes I can. but, do you mean I send that for you?

shape.txt

auspicious3000 commented 4 years ago

Just let me know the shape.

mhosein4 commented 4 years ago

Shape of your metadata is (4, 3) but for me is (2,).

auspicious3000 commented 4 years ago

I mean the shape of the 3rd section

mhosein4 commented 4 years ago

I'm sorry about my fault The third section is String, include the paths of spectograms. Like this ---> 's1\p1_1.npy', 's1\p1_2.npy', 's1\p1_3.npy'

auspicious3000 commented 4 years ago

"I can't understand what is the third section and how to generate it? What is array that's highlight in picture?"

This was your original question. What is the shape of the 3rd section you were refering to?

mhosein4 commented 4 years ago

The picture I sent was related to your metadata. The shape of 3rd section of your metadata is (3,). I want to generate the my metadata like the one in the picture. Sorry if I didn't explain well

auspicious3000 commented 4 years ago

There are definitely more than 3 elements in your highlighted area

mhosein4 commented 4 years ago

I'm so sorry again my fault (90, 80) (89, 80) (75, 80) (109, 80) Metadata include 4 speakers.

auspicious3000 commented 4 years ago

These are the spectrograms

mhosein4 commented 4 years ago

So what is the previous array in the second section? Can you send me the Python file? I'm so confused Thanks again for your good support

auspicious3000 commented 4 years ago

Again, the shape please.

Also, where did you get that metadata?

mhosein4 commented 4 years ago

The shapes are (256,) (256,) (256,) (256,) I sent you an email, and you sent your project.

auspicious3000 commented 4 years ago

Those are the speaker embeddings.

In that case, you already had the code to generate this. If not, you can write your own very easily. I don't keep the code, because it is too simple.

amiteliav commented 2 years ago

@mhosein4 did you understand the metadata format? because I'm trying to run this code now, and I can see that the "metadata.pkl" file in the git does NOT the same as the metadata file would be generated by the "make_metadata.py".

in "metadata.pkl" for every singer there are:

  1. str for the id of the singer
  2. embedding
  3. mel-spec for the songs in the dataset

but when generating a metadata file with "metadata.py" it generates:

  1. str for the id of the singer
  2. embedding
  3. name with type !string! of the songs in the dataset.

so I can't use it... I saw in another issue that someone said the metadata for training and test is different, but I can't understand how and where in the code.

thanks!

lisabecker commented 2 years ago

@amiteliav in case it's still relevant, you can find an end-to-end implementation in this repo/notebook: https://github.com/KnurpsBram/AutoVC_WavenetVocoder_GriffinLim_experiments/blob/master/AutoVC_WavenetVocoder_GriffinLim_experiments_17jun2020.ipynb