cvondrick / soundnet

SoundNet: Learning Sound Representations from Unlabeled Video. NIPS 2016
http://projects.csail.mit.edu/soundnet/
MIT License
462 stars 94 forks source link

Question: Steps to get category labels #3

Closed craftzdog closed 7 years ago

craftzdog commented 7 years ago

Hi, thanks for making great implementation!

I tried to extract features from a sound by using the pretrained models like:

sky: 43.56%
stage, indoor: 5.46%
amusement park: 5.24%

spotlight: 16.74%
fountain: 12.33%
traffic light: 5.76%

I want to get each category labels but I don't understand how to convert them from HDF5 format. Could you please provide me how to get category labels step by step?

Any help will be appreciated.

bobek commented 7 years ago

Hi @cvondrick, I am struggling with the same. You are extracting layer 24 by default in extract_feat.lua. Maybe layer 25 can be a bit more usefult as it seems to be mapping to features:

  net:add(nn.ConcatTable():add(nn.SpatialConvolution(1024, 1000, 1,8, 1,2, 0,0))
                          :add(nn.SpatialConvolution(1024,  401, 1,8, 1,2, 0,0)))

output of that layer is

{
  1 : CudaTensor - size: 1x1000x6x1
  2 : CudaTensor - size: 1x401x6x1
}

But I sort of struggle to interpret "scores". I have used some of the soundclips from the website (so I expect good match by model) but I am not able to select proper labels.

Would you mind posting the actual tooling you have used for generating labeled videos?

Thank you

cvondrick commented 7 years ago

I added a script to extract the predictions in extract_predictions.lua. To use it, you create a text file of paths to MP3s, and pass it in:

list=data.txt th extract_predictions.lua

Does this help? Sorry for delay in response!

bobek commented 7 years ago

Awesome, thank you @cvondrick . Appreciated.

craftzdog commented 7 years ago

great!! thank you @cvondrick