chaoyuaw / pytorch-coviar

Compressed Video Action Recognition
https://www.cs.utexas.edu/~cywu/projects/coviar/
GNU Lesser General Public License v2.1
502 stars 126 forks source link

./install.sh #6

Open manza-ari opened 6 years ago

manza-ari commented 6 years ago

Hi Sir,

Kindly check this error? how can I resolve this?

error

dongzhuoyao commented 6 years ago

try python3

manza-ari commented 6 years ago

Thank you so for reply.

Python3 how and where?

I am using ubuntu 16 and I have python3. But didn't get your answer.

On Tue, Jul 24, 2018, 14:01 dongzhuoyao notifications@github.com wrote:

try python3

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/chaoyuaw/pytorch-coviar/issues/6#issuecomment-407281732, or mute the thread https://github.com/notifications/unsubscribe-auth/AgJ3EWtpm5ilQ-qzx3m01m3EHPhGxSwaks5uJqoMgaJpZM4VcEMN .

dongzhuoyao commented 6 years ago

please make sure you are using python3, in my machine, if I use python2, it shows the same error as you.

after switching to python3, everything is ok now.

manza-ari commented 6 years ago

OK thank you so much for your reply. I try.

On Tue, Jul 24, 2018, 14:26 dongzhuoyao notifications@github.com wrote:

please make sure you are using python3, in my machine, if I use python2, it shows the same error as you.

after switching to python3, everything is ok now.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/chaoyuaw/pytorch-coviar/issues/6#issuecomment-407285416, or mute the thread https://github.com/notifications/unsubscribe-auth/AgJ3EQIPT1jejGe_HLsMem4Glr7ilxBDks5uJrALgaJpZM4VcEMN .

manza-ari commented 6 years ago

Where exactly you want me to use python3? I am new in this area. I have not used any python command in the entire procedure. Please elaborate?

chaoyuaw commented 6 years ago

Thanks, @dongzhuoyao!

Hi @kanza-ali , In install.sh, line 2 and line 3 call python. Could you please try replacing "python" by "python3" in install.sh?

manza-ari commented 6 years ago

Thank you for your reply.

I did changes and run install.sh again. but I have this output. 2

chaoyuaw commented 6 years ago

Could you please elaborate on what your question is on this output? Those warnings are expected, and it looks like it succeeded :)

manza-ari commented 6 years ago

OK Thank you so so much!

manza-ari commented 6 years ago

I run the train.py

and output is following: 3

I have some error in the end. What is this about. Cannot find anything on the internet.

dongzhuoyao commented 6 years ago

how many GPU do you use?

manza-ari commented 6 years ago

for any of the gpu numbers this gives this error:

raise AssertionError("Invalid device id") AssertionError: Invalid device id

Either I run the first training code for UCF or HMDB

python3 train.py --lr 0.0003 --batch-size 40 --arch resnet152 \ --data-name hmdb51 --representation iframe \ --data-root data/hmdb51/mpeg4_videos \ --train-list data/datalists/hmdb51_split1_train.txt \ --test-list data/datalists/hmdb51_split1_test.txt \ --model-prefix hmdb51_iframe_model \ --lr-steps 55 110 165 --epochs 220 \ --gpus 0 1

OR

python3 train.py --lr 0.0003 --batch-size 80 --arch resnet152 \ --data-name ucf101 --representation iframe \ --data-root data/ucf101/mpeg4_videos \ --train-list data/datalists/ucf101_split1_train.txt \ --test-list data/datalists/ucf101_split1_test.txt \ --model-prefix ucf101_iframe_model \ --lr-steps 150 270 390 --epochs 510 \ --gpus 0 1 2 3

manza-ari commented 6 years ago

Did you do following changes before running the training lines given under USAGE heading:

from coviar import load load([input], [gop_index], [frame_index], [representation_type], [accumulate])

manza-ari commented 6 years ago

Hi @dongzhuoyao ,

How much time your training is taking for this project? Training of only one dataset like (UCF 101).

I have run only this with 8 GPUs:

python3 train.py --lr 0.0003 --batch-size 80 --arch resnet152 \ --data-name ucf101 --representation iframe \ --data-root data/ucf101/mpeg4_videos \ --train-list data/datalists/ucf101_split1_train.txt \ --test-list data/datalists/ucf101_split1_test.txt \ --model-prefix ucf101_iframe_model \ --lr-steps 150 270 390 --epochs 510 \ --gpus 0 1 2 3 4 5 6 7

JGyoung33 commented 6 years ago

Hi @dongzhuoyao ,

How much time your training is taking for this project? Training of only one dataset like (UCF 101).

I have run only this with 8 GPUs:

python3 train.py --lr 0.0003 --batch-size 80 --arch resnet152 --data-name ucf101 --representation iframe --data-root data/ucf101/mpeg4_videos --train-list data/datalists/ucf101_split1_train.txt --test-list data/datalists/ucf101_split1_test.txt --model-prefix ucf101_iframe_model --lr-steps 150 270 390 --epochs 510 --gpus 0 1 2 3 4 5 6 7

Hi, I met with the same problem. AssertionError.

I'm a new in this area, my computer has 1 GPU, please tell me what parameters I should change?Thank you!

manza-ari commented 6 years ago

Hi @JGyoung33

First try to train motion vector and residual which has resnet18 arch with less batch size, if they are working and giving you result with 1 GPU that means everything is fine and you need server support to train for iframes for arch resnet152. If not then share your error.

JGyoung33 commented 6 years ago

Hi @JGyoung33

First try to train motion vector and residual which has resnet18 arch with less batch size, if they are working and giving you result with 1 GPU that means everything is fine and you need server support to train for iframes for arch resnet152. If not then share your error.

Hi, thank you for replying. Now, I have a new error. I trained iframe for the beginning, and since I only have 1 GPU, I set GPUS with 0. Now there are new errors and it seems to be related with pytorch's function. I don't find how to resolve it. 2018-10-07 21-52-14 2018-10-07 21-52-22

When I trained iframe at another computer with two GPUs, it can worked but it would stop in the training process. there is another error called "cuda: out of memory".Although I reduced the batchsize ,it cannot work.

manza-ari commented 6 years ago

What is memory size for each GPU, you are using for calculating iframes? What batchsize you are giving? I suggest you first try calculating MV and residuals.

JGyoung33 commented 6 years ago

Hi, @kanza-ali

I use 1080ti to train iframes and I set batchsize at 5, it also cannot worked, and after datasets are augmented , it would print "Could not open input stream" like below, but it can continue. 2018-10-08 16-24-42

When I trained MV, some video would be failed to decode, like this: 2018-10-08 16-27-46

2018-10-08 16-28-31

I think these errors is related with my pytorch dataloader such as num_worker, but I am not familar with it.

manza-ari commented 6 years ago

is your FFmpeg is working?

I have never experienced such errors while doing this project. BTW I also have baby experience in this area.

JGyoung33 commented 6 years ago

is your FFmpeg is working?

I have never experienced such errors while doing this project. BTW I also have baby experience in this area.

I think it's working, since I use it to produce mpeg4-format videos.

JGyoung33 commented 6 years ago

is your FFmpeg is working?

I have never experienced such errors while doing this project. BTW I also have baby experience in this area.

Now I uninstall ffmpeg, and it would be the same errors, it seems uninstalling ffmpeg don't affect other programs, it only works in the process which transforms raw format to mpeg4 format, is it right?

BTW, I compile ffmpeg using gcc -4.8, is the version too old? Looking forward to your reply, thank you~

manza-ari commented 6 years ago

I am using Ubuntu 16 and I have installed ffmpeg version N-90418-g74c6a6d built with gcc 5.4.0 (Ubuntu 5.4.0-6ubuntu1~16.04.10)

JGyoung33 commented 6 years ago

Hi, @kanza-ali

I upgrade my gcc to 6.0 version and now I can train iframe, thank you. But in the training process(until 2nd epoch), there will be another error. If I train in the command line, it would remind me "Cuda: out of memory". In comparison, if I trained iframes in Vs code debug mode, it will appear below: 2018-10-08 19-49-12

manza-ari commented 6 years ago

Thanks for updates. For calculating iframes for 80 batch-size, I have used 4 GPUs with following changes in train.py

model=torch.nn.DataParallel(model,device_ids=range(torch.cuda.device_count())) model.cuda()

The above changes also solve your "Cuda: out of memory" error.

Sorry, I cannot comment on the error you shared, as I have already told you about my baby experience in this area. you can share your error on Pytorch forum.

JGyoung33 commented 6 years ago

@kanza-ali

Anyway, thank you. if I have some updates, I must share with you, thank you again.

JGyoung33 commented 6 years ago

Thanks for updates. For calculating iframes for 80 batch-size, I have used 4 GPUs with following changes in train.py

model=torch.nn.DataParallel(model,device_ids=range(torch.cuda.device_count())) model.cuda()

The above changes also solve your "Cuda: out of memory" error.

Sorry, I cannot comment on the error you shared, as I have already told you about my baby experience in this area. you can share your error on Pytorch forum.

HI. Can you tell me your Opencv version?

manza-ari commented 6 years ago

My version is 3.1

JGyoung33 commented 6 years ago

My version is 3.1

Thank you~BTW, what is your CUDA and cuDNN version? I guess it may be related to my failure.

manza-ari commented 6 years ago

CUDA 8

Don't worry, keep trying.

brendaoo commented 3 years ago

I have the same question. I have try python3, but it doesn't work.