Linardos / SalEMA

Simple vs complex temporal recurrences for video saliency prediction (BMVC 2019)
https://imatge-upc.github.io/SalEMA/
25 stars 11 forks source link

Training on custom dataset #2

Closed saivineethkumar closed 4 years ago

saivineethkumar commented 5 years ago

@Linardos Hi, I want to train the model on a custom dataset, can you describe the steps involved and details like the structure and format of the training input and annotation data, the training parameters you used for the training etc.

thank you.

Linardos commented 5 years ago

If the folder structure is up to you, I'd suggest you follow the same one as DHF1K does. That way you can use the same data loader and it will be much more straightforward. In summary: There should be two superfolders a "frames" and a "maps" so that in one you have the folders of frames for each video and in the other folders of saliency maps. So each video is a numbered folder (1-700) then each folder contains numbered frames (0001.png, 0002.png etc). So: frames - > 1 - > 0001.png, 0002.png maps - > 1 - > 0001.png, 0002.png etc You might want to check DHF1K in case I'm missing something: https://github.com/wenguanwang/DHF1K

After that you just have to set your dataset=DHF1K or dataset=other (I think the only difference between these two is that "other" does not require you to number the folders)

As for the training hyperparameters: frame_size = (192, 256) learning_rate = 0.000001 decay_rate = 0.1 momentum = 0.9 weight_decay = 1e-4 clip_length = 10 epochs = 7 batch_size = 1

With regards to the alpha, we found that it is feasible to make it a parameter and achieves just as good performance as fine tuning it as a hyperparameter, saving you the trouble. Note that it has to be trained with a much higher learning rate (lr=0.1 in this case). Just input alpha=None and alpha will be learned like that. However, the pretrained model which was uploaded on github had alpha set to 0.1.