andrewowens / multisensory

Code for the paper: Audio-Visual Scene Analysis with Self-Supervised Multisensory Features
http://andrewowens.com/multisensory/
Apache License 2.0
220 stars 60 forks source link

Questions about the entrance of the training function #14

Open yxixi opened 5 years ago

yxixi commented 5 years ago

Great job! I tried to train this model by myself, however, I encountered some problems. I am anxious to know the solutions to these problems. a. Could you point out a detailed method of calling the training function? b. How to input the 'Kinetics-Sounds' dataset into the model for training? c. I noticed that you mentioned 'rewriting the read_data(pr, gpus) function'. I wonder what the variable 'pr' stands for. Looking forward to your reply soon! Thanks! @andrewowens

andrewowens commented 5 years ago

Sorry for the slow reply!

a) I usually run it like this:

python -c "import sep_params, sourcesep; sourcesep.train(sep_params.full(num_gpus=3), [0, 1, 2], restore = False)"

This uses the "full" parameter set defined in sep_params.py

b) I kept only these categories from the Kinetics dataset: blowing nose bowling chopping wood ripping paper shuffling cards singing tapping pen using computer blowing out candles dribbling basketball laughing mowing lawn shoveling snow stomping grapes tap dancing tapping guitar tickling strumming guitar playing accordion playing bagpipes playing bass guitar playing clarinet playing drums playing guitar playing harmonica playing keyboard playing organ playing piano playing saxophone playing trombone playing trumpet playing violin playing xylophone

following this paper by Arandjelovic and Zissserman (note that the list of categories in their paper is slightly out of date, though, since it used a pre-release version of the Kinetics dataset).

c) The "pr" variable is the parameter set. You can find examples of these in sep_params.py, such as "full" (the full model), and "unet_pit" (only sound).

Hope that helps!