Excuse me, I am a beginner in the audio field, and I would like to ask you how to apply the pre-trained model to the esc-50 dataset.
The audio length of the esc50 data set is 5 seconds, but in the visualize demo provided by the author, the input of the pre-training model seems to be 10s.When I use this model to reconstruct the audio in esc50, the regular spectrogram is full of noise points.
Can you tell me how to use the pre-trained model to reconstruct and visualize the audio data in esc-50?
Excuse me, I am a beginner in the audio field, and I would like to ask you how to apply the pre-trained model to the esc-50 dataset. The audio length of the esc50 data set is 5 seconds, but in the visualize demo provided by the author, the input of the pre-training model seems to be 10s.When I use this model to reconstruct the audio in esc50, the regular spectrogram is full of noise points. Can you tell me how to use the pre-trained model to reconstruct and visualize the audio data in esc-50?