How can i modify the train file to multiple gpu version

LossNAN / I3D-Tensorflow

Train I3D model on ucf101 or hmdb51 by tensorflow

Apache License 2.0

112 stars 28 forks source link

How can i modify the train file to multiple gpu version #4

Open Epiphqny opened 5 years ago

Epiphqny commented 5 years ago

hello, i used the train_ucf file and modify it to multi-gpu according to your resnet-3d version, but do not success, can you provide a multi-gpu version for this model? thanks.

LossNAN commented 5 years ago

you are right, this code just for one gpu, the multi_gpu version will public in two days if you need

Epiphqny commented 5 years ago

you are right, this code just for one gpu, the multi_gpu version will public in two days if you need

Yes, i implemented the multi-gpu version myself but there are still some problems, will be very grateful if you can release your version!

LossNAN commented 5 years ago

@Epiphqny The version of multi_gpu has been pushed ,but i haven't do validation because all my gpus are working , so if there are some bugs you can not solve it , please contact me. Also, the updated codes are modified from my 3D-resnet-tensorflow code, and this code just for rgb, data_loading was modified by using tensorflow pipeline for speeding.(if not use, it will take 4 sec one step, very slow). Best wishes!

Epiphqny commented 5 years ago

@LossNAN OK, thank you very much for your help, i will try it.

Epiphqny commented 5 years ago

@LossNAN Can i ask how do you deal with the batch normalization items when saving the model?

LossNAN commented 5 years ago

@Epiphqny if you use my I3D(inception 3d) code , the net work was built by 'sonnet'(snt.BatchNorm()) which has already packaged so you do not need create 'beta' 'gama' ,and the 'mean, variance' will be added to tf.GraphKeys.UPDATE_OPS, so you can see 'with tf.control_dependencies(update_ops):'to update the 'mean, variance' and wil be saved , when you test and set is_training to false , sonnet will use 'mean, variance' saved to compute; another version you can get in my bn_function of 3d-resnet-tensorflow

LossNAN commented 5 years ago

@Epiphqny if you want to know more ,i will be very glad to help you out with any queries, my QQ_number:346925546

Epiphqny commented 5 years ago

@LossNAN 已加

bhkim1020 commented 5 years ago

Hello, I also tried to train multiple cpu and refer your last Dec codes.

I have seen the multi_gpu_train_kinetics_rgb.py file, and there seems to be a non_local related code that is not in git. I wonder if there was no change in the i3d internals. ex. share variable in i3d network........

sanolans commented 4 years ago

Hi @LossNAN, thank you for sharing your work. Do you have the code for training a singular gpu as I am using 1 gpu? Also, while using trying the multi_gpu code, there is a NameError because the learning_rate is not defined, amI doing something wrong? Can you also please please explain the non_local reference in the code?