cvondrick / soundnet

SoundNet: Learning Sound Representations from Unlabeled Video. NIPS 2016
http://projects.csail.mit.edu/soundnet/
MIT License
461 stars 94 forks source link

How much memory is need to run the training scripts? #4

Closed MaigoAkisame closed 7 years ago

MaigoAkisame commented 7 years ago

I have downloaded the training data from the demo website (https://projects.csail.mit.edu/soundnet/), and was trying to run the main_train.lua script. But I always get an out-of-memory error at the following line:

    optim.adam(fx, parameters, optimState)

The same thing happens even if I run main_train_small.lua.

I am using 120GB CPU memory and 4.7GB GPU memory. Do I need more?

cvondrick commented 7 years ago

We trained it on GPUs with 12 GB of memory. You can try changing the batchSize until it fits in memory.

In our experiments, we found larger batch size improves performance, so if you change batch size, you should expect a small drop in final performance.

MaigoAkisame commented 7 years ago

Thanks! I was able to get the code running.

I have another question regarding the objective function. When I run the code I see logs like:

soundnet: Iteration: [1]         Time: 1.409  DataTime: 0.001    Err: 3.3490

What exactly is the error? I know it's KL-divergence. I see that your network produces outputs at 4 time steps per audio clip, and these outputs are compared against 2 distributions. So is the error value a sum or a mean of the 8 KL-divergences? And is it measured on the training or validation data?

cvondrick commented 7 years ago

It should be the mean KL (across 8 distributions) multiplied by lambda.

It's reporting error on training here.

MaigoAkisame commented 7 years ago

Thank you!

MaigoAkisame commented 7 years ago

Sorry, but I just realized that lambda is a pretty big value (250), which means the KL divergence itself is only about 0.01. Is the performance of the model already so good even at the first iteration?

cvondrick commented 7 years ago

I believe the torch DistKLCriterion is internally dividing by the number of dimensions, which is why it's so small.