Closed MaFuyan closed 2 years ago
The shape of the embedding_batch generated by your script audio_feature_extractor.py is [10, 6 , 4, 512], which is different from the predefined shape [len_data, 10, 128] of the audio_features.
Problem solved. The define_vgg_slim function in vggish_slim.py returns the unflatten feature net1 instead of the expected net.
The shape of the embedding_batch generated by your script audio_feature_extractor.py is [10, 6 , 4, 512], which is different from the predefined shape [len_data, 10, 128] of the audio_features.