Questions regarding training models

Clpsplug commented 5 years ago

I am currently retraining the models to evaluate my modification to adopt the latest essentia (related: #7.) I want to know a few points about the training process:

How long is the training supposed to take? I know it mostly depends on the spec, but for the Step Selection model, it took half a day on Core i7 3770 @ 3.40GHz; The Step Placement model, however, is estimated to take over two weeks on the same CPU! I'm not sure if I'm messing something up.
After training, which files in/tmp folder should go to server_aux inside infer directory? I found a lot of files there, but none of their names matches the one in server_aux.

chrisdonahue commented 5 years ago

First of all, thank you for your continued interest in this project. I fear that I have sent you on a wild goose chase however, as Dance Dance Convolution was my first foray into the world of ML and is kind of a nightmare of development/research code. Can you explain your overall goals to me so that I know the best way to help you? Are you trying to train a model on different step charts? Are you trying to iterate on the model to improve performance?

To specifically answer your questions while awaiting your response:

The step selection model will likely take a very long time to train on a CPU. The reason for this is that it is operating on the actual audio, which requires much more computation. GPUs are optimized for this kind of computation, whereas the model comprising step selection might even run faster on a CPU.

You might be able to use an off-the-shelf onset detection algorithm such as the one in https://github.com/CPJKU/madmom instead of training a step placement network, although this will take some hacking of the code.

The model checkpoints (usually they end in .ckpt) go in the server_aux directory. I use the checkpoint that has the best performance on a set of stepcharts that are not included in the training data.

Clpsplug commented 5 years ago

Thanks for your reply! My intent is to make the code work with the latest essentia. It's true we want to switch away from essentia (#7,) but I believe that particular crash I'm experiencing will continue to exist even if we move to another library. Also, this action requires me a retrain because the analyzed audio from the dataset will change.

Below is what made me open this issue:

Getting the energy in 80 mel bands of a spectrum from FFTs with the window length of 1024 is now deemed impossible (essentia crashes as of 2.1b4.)
The docs and the error message say that I need to use zero-padding to increase the number of the FFT spectrum. I had to use 2048-point FFT. If I remember right, this cuts the energy in half. It affects the original data and the training data, so I need a retrain.
I set up an Ubuntu (not VM) and let the script do its thing; The step selection model finished building itself in about half a day, but the step placement model is still running (epoch around 40.)

At this point, I worried I was doing something horribly wrong, so I contacted you. I also didn't know which file ./sml_sym_2_train.sh created went into server_aux folder when I wanted to test the model. You said:

The model checkpoints (usually they end in .ckpt) go in the server_aux directory

but I don't find any file ending in .ckpt. I do see a file named checkpoint. Is this it? Or did I mess up because I used the latest tensorflow?

adgelbfish commented 5 years ago

@Clpsplug you can still use the older version of essentia until Chris releases v2.

Also, I am working on a Dockerfile that creates a full working installation. Message me on twitter if you want it. (@adgelbfish)

chrisdonahue / ddc

Questions regarding training models #8