Closed hmy410 closed 4 years ago
Hi, num_audio_samples is the length of audio array. If type is 'fixed' your wavs have to be of same length. In your case, it seems you have at least one wav with 19200 samples - instead of 48000. Maybe you have made mistakes during the audio preprocessing stage.. Are you able to produce one or more tfrecords?
Thanks for your advice. I changed type into 'var' because audio array have different length (I use grid dataset). I also reset the num_audio_samples. But I have another question now. Since AV concat-ref is retrained while freezing the parameters of the VL2M component, does it mean that I should output the TBM of VL2M and replace the origin TBM computed by LTASS?
Yes, you are right. You must save the TBMs of VL2M and generate the TFRecords replacing the original TBM with the estimated one with VL2M. Then you can train the AV concat-ref model.
Hello, thanks for your work. I'm training the vl2m model using grid dataset. l set TFRecord type='fixed', num_audio_samples=48000, batch_size=10. But there is an error when I start training:
tensorflow.python.framework.errors_impl.InvalidArgumentError: Name: , Key: base_audio_wav, Index: 0. Number of float values != expected. values size: 19200 but output shape: [48000]
l tried to change num_audio_samples but it didn't work. Does 19200 mean the length of the 0th wav while 48000 means the number of audio wavs? Hope for your reply.