HPI-DeepLearning / crnn-lid

Code for the paper Language Identification Using Deep Convolutional Recurrent Neural Networks
GNU General Public License v3.0
105 stars 48 forks source link

Predict new audio - audio is not found #30

Closed Arafat4341 closed 3 years ago

Arafat4341 commented 3 years ago

I am trying to predict. I have specified the audio path correctly but still I am getting error: ValueError: need at least one array to stack

full error:

('SpectrogramGenerator Exception: ', IOError(2, 'No such file or directory'), 'audios/speech.mp3')
Traceback (most recent call last):
  File "predict.py", line 42, in <module>
    predict(cli_args)
  File "predict.py", line 17, in predict
    data = np.stack(data)
  File "/usr/local/lib/python2.7/dist-packages/numpy/core/shape_base.py", line 335, in stack
    raise ValueError('need at least one array to stack')
ValueError: need at least one array to stack

Does input audio needs to be of exact 10 secconds? another question: python predict.py --model <path/to/model> --input <path/to/speech.mp3>

Here, what should be the path to model ?

Bartzi commented 3 years ago

I think it makes sense that you supply the absolute path to your mp3 file not a relative path. Supplying an absolute path helps you to make sure that the program is searching in the correct directory.

Path to model should be the path to you trained language identification model.

Arafat4341 commented 3 years ago

@Bartzi But I am getting error even after supplying the absolute path of mp3 file.

I trained the topcoder_crnn_finetune.py model. And I added model.load_weights("absolute/path/to/my/trained/weight", by_name=True) to the model module. I gave the path models/topcoder_crnn_finetune.py in running predict.py.

Did I do everything correctly?

Arafat4341 commented 3 years ago

I am using google colab for training. I mounted drive on colab. The read and write operation happens on the drive.

SO the absolute path of my audio is: /content/drive/My\ Drive/crnn-lid/keras/audios/speech.mp3 I provided this path. Still getting: ('SpectrogramGenerator Exception: ', IOError(2, 'No such file or directory'), 'audios/speech.mp3')

Arafat4341 commented 3 years ago

@Bartzi Hello! I am just getting this error: ('SpectrogramGenerator Exception: ', IOError(2, 'No such file or directory'), 'audios/speech.mp3')

But I have a directory inside keras/ named audio/ and I have placed the audio file speech.mp3 there. But still getting this error!

Here is my command line: python predict.py --model models/logs/2020-07-29-05-05-31/weights.12.model --input audios/speech.mp3

Do you have any idea why I am getting this?! Thanks!

Arafat4341 commented 3 years ago

I saw only 10 sec audio clip for testing works. After delivering a 10 sec audio I avoided the error. Was it supposed to happen?

Bartzi commented 3 years ago

Hi,

sry for not coming back to you earlier. Did you solve your problems now?

I saw only 10 sec audio clip for testing works.

Yes, that is correct for training a model with default settings. You can, however, train another model by setting the semgent length (https://github.com/HPI-DeepLearning/crnn-lid/blob/master/keras/config.yaml#L15).

Arafat4341 commented 3 years ago

Thanks for your response! It's a pleasure! @Bartzi Actually I trained with 10 sec audio files. But I am talking about testing the trained model with new audio.

Bartzi commented 3 years ago

If you only train on 10 sec audio files, the resulting model will also only work with 10 second snippets :shrug:. If you want to use different audio lengths, you'll have to train new models.

Arafat4341 commented 3 years ago

Ah... I see! Thanks a lot!

Arafat4341 commented 3 years ago

@Bartzi Don't I need to change anything else in data preparation stage? I mean to cut them in 3 second chunks?

Bartzi commented 3 years ago

Of course, you also need to create spectograms according to the audio length you want to use.

Arafat4341 commented 3 years ago

@Bartzi Thanks a lot. I have made changes in download_youtube.py in the line 67: command = ["ffmpeg", "-y", "-i", f, "-map", "0", "-ac", "1", "-ar", "16000", "-f", "segment", "-segment_time", "3", output_filename]

I have set the segment_time to 3. And also in config.yaml. Is that all?

Arafat4341 commented 3 years ago

@Bartzi I am failing to create mel-spectrograms for 3 sec long audios. The images are not generated. Is there any specific input shape, pixel per second for 3 sec long audios?

Bartzi commented 3 years ago

I think so :sweat_smile: if you look at the default values you can see that if we set pixels_per_second to 50 and have audio snippets of 10 seconds, we supply a width of 500, since 50 * 10 = 500.

Arafat4341 commented 3 years ago

Ah! Thanks a lot! I changed the width to 150 now!