WeidiXie / VGG-Speaker-Recognition

Utterance-level Aggregation For Speaker Recognition In The Wild
362 stars 98 forks source link

is there any qualitative analysis of the network learn a good embedding feature, any visualize tools for this ? #26

Closed mmxuan18 closed 5 years ago

mmxuan18 commented 5 years ago

in computer vision fields, there is some tools to visualize what the network learned for the final classification, such as gradcam/cam and so on in speaker recognition fields, how to analysis the output which activate the input, then i can say the network learn a good feature directly. what are the generality things in the input spectrograms for different context of the same speaker?

a example of grad-cam audio image

WeidiXie commented 5 years ago

Well, most of the methods in computer vision is to probe which region of the image may affect network predictions the most. I guess you can do similar things on spectrogram as well. But I never tried it myself.

Best, Weidi