dmlc / gluon-cv

Gluon CV Toolkit
http://gluon-cv.mxnet.io
Apache License 2.0
5.82k stars 1.21k forks source link

Train action recognition on my own dataset #1100

Closed nicholasguimaraes closed 4 years ago

nicholasguimaraes commented 4 years ago

Hello, do I have to adapt the code to train an action recognition network on my own dataset?

bryanyzhu commented 4 years ago

At this moment, you just need to do two things,

  1. Write a dataloader for your own dataset.
  2. Change the last layer in the model to number of classes in your dataset

I'm working on a tutorial about fine-tuning on custom dataset, will finish in 2 or 3 days. Once it is finished, I will let you know.

nicholasguimaraes commented 4 years ago

At this moment, you just need to do two things,

1. Write a dataloader for your own dataset.

2. Change the last layer in the model to number of classes in your dataset

I'm working on a tutorial about fine-tuning on custom dataset, will finish in 2 or 3 days. Once it is finished, I will let you know.

Thank you Bryan!

I have a quick question that perhaps you could have the answer though.

I was trying to start training on the UCF-101 dataset. I downloaded the videos, extracted the frames and got the train/test split .txt files.

My trainlist01.txt is exactly like this:

ApplyEyeMakeup/v_ApplyEyeMakeup_g08_c01.avi 1 ApplyEyeMakeup/v_ApplyEyeMakeup_g08_c02.avi 1 ApplyEyeMakeup/v_ApplyEyeMakeup_g08_c03.avi 1 . . . I got a list index out of range error when I tried to start training.

My question is, should the trainlist01.txt have the path to the frames?

exmp:

v_ApplyEyeMakeup_g01_c01/frame_0.jpg 1 v_ApplyEyeMakeup_g01_c01/frame_1.jpg 1 v_ApplyEyeMakeup_g01_c01/frame_2.jpg 1 . . .

bryanyzhu commented 4 years ago

The format of the txt files should have three items in each line, videopath, numframes, label. For example, ApplyEyeMakeup/v_ApplyEyeMakeup_g08_c01 300 0

Here, ApplyEyeMakeup/v_ApplyEyeMakeup_g08_c01: is the folder containing the frames for the video v_ApplyEyeMakeup_g08_c01.avi. 300: is the number of frames of this video 0: is the label of this video

You only have 2 items per line, that's why you have the out of index error. You can either modify your txt file to have 3 items, or follow our tutorials to prepare the ucf101 dataset.

https://gluon-cv.mxnet.io/build/examples_datasets/ucf101.html

It's easy to follow our tutorial, just type

python ./scripts/datasets/ucf101.py

It will download the data, extract the frames and prepare the txt file for you automatically. Everything will be saved to ~/.mxnet/datasets/ucf101

bryanyzhu commented 4 years ago

@nicholasguimaraes Hi I have a tutorial for fine tuning models on user's own data.

https://gluon-cv.mxnet.io/build/examples_action_recognition/finetune_custom.html

Hope it will help. Please let me know any comments and feedback. Thank you.

nicholasguimaraes commented 4 years ago

Thank you for the explanation

The format of the txt files should have three items in each line, videopath, numframes, label. For example, ApplyEyeMakeup/v_ApplyEyeMakeup_g08_c01 300 0

Here, ApplyEyeMakeup/v_ApplyEyeMakeup_g08_c01: is the folder containing the frames for the video v_ApplyEyeMakeup_g08_c01.avi. 300: is the number of frames of this video 0: is the label of this video

You only have 2 items per line, that's why you have the out of index error. You can either modify your txt file to have 3 items, or follow our tutorials to prepare the ucf101 dataset.

https://gluon-cv.mxnet.io/build/examples_datasets/ucf101.html

It's easy to follow our tutorial, just type

python ./scripts/datasets/ucf101.py

It will download the data, extract the frames and prepare the txt file for you automatically. Everything will be saved to ~/.mxnet/datasets/ucf101

Thank you for the explanation.

I tried running ucf.py but couldn't make it work so I decided creating the trainlist01.txt and testlist01.txt myself.

The columns are frame_path , num_frames , label_id

The video frames divided into directories (named after their respective video names) and all of them inside one directory called frames.

This is how my trainlist and testlist turned out:

train

frames/v_ApplyEyeMakeup_g01_c01 164 1 frames/v_ApplyEyeMakeup_g01_c02 123 1 frames/v_ApplyEyeMakeup_g01_c03 259 1 . . .

test (I'm not passing the label because these are test examples)

frames/v_ApplyEyeMakeup_g01_c05 296 frames/v_ApplyEyeMakeup_g01_c06 122 frames/v_ApplyEyeMakeup_g02_c01 170 . . .

But I still got the same error IndexError: list index out of range

The traceback comes from classification.py , line 159 inside the function getitem

159 clip_input = clip_input.reshape((-1,) + (self.new_length, 3, self.target_height, self.target_width))

I wonder what's wrong with my text file

bryanyzhu commented 4 years ago

Even you don't have label for your test examples, you need to provide a value there. The code requires three items per line.

nicholasguimaraes commented 4 years ago

Even you don't have label for your test examples, you need to provide a value there. The code requires three items per line.

I just added their own label_id to fill the third column in the test.txt file and now I got another list index out of range error but from line 158 (classification.py)

Traceback (most recent call last): File "train_recognizer.py", line 631, in main() File "train_recognizer.py", line 432, in main train_data, val_data, batch_fn = get_data_loader(opt, batch_size, num_workers, logger) File "train_recognizer.py", line 322, in get_data_loader num_segments=opt.num_segments, transform=transform_train) File "C:\Users\Windows\AppData\Roaming\Python\Python36\site-packages\gluoncv\data\ucf101\classification.py", line 88, in init self.clips = self._make_dataset(root, setting) File "C:\Users\Windows\AppData\Roaming\Python\Python36\site-packages\gluoncv\data\ucf101\classification.py", line 158, in _make_dataset duration = int(line_info[1]) IndexError: list index out of range

Is this regarding the length (frames) of each clip?

bryanyzhu commented 4 years ago

I don't think so. Which version of gluoncv do you use?

If you look at these two lines, it will raise error if line_info doesn't have three items. So line_info will definitely be a list with three items in it.

I think this error may come from other places. Please attach more information for me to debug.

  1. your training command line
  2. your gluoncv version
nicholasguimaraes commented 4 years ago

The version of gluoncv I use is 0.5.0

My training command line is as follow,

python train_recognizer.py --model inceptionv3_ucf101

I'm passing the path to dataset and train/test files in the code changing the default arg.

But if I was passing in the command line it would look like this:

python train_recognizer.py --dataset ucf101 --data-dir C:/Users/Windows/Documents/gluon-cv/scripts/action-recognition/frames/ --val-data-dir C:/Users/Windows/Documents/gluon-cv/scripts/action-recognition/frames/ --train-list ucfTrainTestlist/trainlist01.txt --val-list ucfTrainTestlist/testlist01.txt --model inceptionv3_ucf101

bryanyzhu commented 4 years ago

It seems your environment is windows, I can't reproduce your error on my end. It's hard for me to debug where goes wrong.

If it is possible, please install the master branch or nightly version of gluoncv. We have made quite a few changes since 0.5.0. If there is still error, please follow our tutorials online to download ucf101 dataset, decode frames and prepare the txt, so we can make sure whether the code has bug or your data has problems.

nicholasguimaraes commented 4 years ago

It seems your environment is windows, I can't reproduce your error on my end. It's hard for me to debug where goes wrong.

If it is possible, please install the master branch or nightly version of gluoncv. We have made quite a few changes since 0.5.0. If there is still error, please follow our tutorials online to download ucf101 dataset, decode frames and prepare the txt, so we can make sure whether the code has bug or your data has problems.

I just updated to the Nightly Release. Now I'm running version 0.6.0

I ran the code again and then it pointed me to a line in the train.txt file that was missing the label_id.

I fixed it and ran the code one more time, it loaded the model as usual and then printed this to the screen:

Load 11415 training samples and 1905 validation samples.

It seemed that it would start training,

then it gave me a traceback

Traceback (most recent call last): File "train_recognizer.py", line 676, in main() File "train_recognizer.py", line 477, in main train_data, val_data, batch_fn = get_data_loader(opt, batch_size, num_workers, logger) File "train_recognizer.py", line 404, in get_data_loader prefetch=int(opt.prefetch_ratio * num_workers), last_batch='rollover') File "C:\Users\Windows\AppData\Roaming\Python\Python36\site-packages\mxnet\gluon\data\dataloader.py", line 641, in init initargs=[self._dataset, is_np_shape(), is_np_array()]) File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\multiprocessing\context.py", line 119, in Pool context=self.get_context()) File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\multiprocessing\pool.py", line 174, in init self._repopulate_pool() File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\multiprocessing\pool.py", line 239, in _repopulate_pool w.start() File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\multiprocessing\process.py", line 105, in start self._popen = self._Popen(self) File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\multiprocessing\context.py", line 322, in _Popen return Popen(process_obj) File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\multiprocessing\popen_spawn_win32.py", line 65, in init reduction.dump(process_obj, to_child) File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\multiprocessing\reduction.py", line 60, in dump ForkingPickler(file, protocol).dump(obj) TypeError: can't pickle module objects

C:\Users\Windows\Documents\gluon-cv\scripts\action-recognition>Traceback (most recent call last): File "", line 1, in File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\multiprocessing\spawn.py", line 105, in spawn_main exitcode = _main(fd) File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\multiprocessing\spawn.py", line 115, in _main self = reduction.pickle.load(from_parent) EOFError: Ran out of input

bryanyzhu commented 4 years ago

Hi this is windows error. I checked online, there are many people complaining about it. You can see these posts as well.

https://discuss.pytorch.org/t/eoferror-ran-out-of-input-when-enumerating-the-train-loader/22692

https://discuss.pytorch.org/t/pytorch-windows-eoferror-ran-out-of-input-when-num-workers-0/25918/6

Basically, this error is because the input size is too large, which is beyond the limit of Pickle (4GB). A workaround is to set num_workers=0 in dataloader, you should be able to start training. But since num_workers=0, your training will be slow.

nicholasguimaraes commented 4 years ago

Hi this is windows error. I checked online, there are many people complaining about it. You can see these posts as well.

https://discuss.pytorch.org/t/eoferror-ran-out-of-input-when-enumerating-the-train-loader/22692

https://discuss.pytorch.org/t/pytorch-windows-eoferror-ran-out-of-input-when-num-workers-0/25918/6

Basically, this error is because the input size is too large, which is beyond the limit of Pickle (4GB). A workaround is to set num_workers=0 in dataloader, you should be able to start training. But since num_workers=0, your training will be slow.

Hello Bryan, I have changed the num of workers to 0 and got closer to training the network.

After changing the num workers I got some errors in the _image_TSN_cv2_loader() regarding the path to my data. name_pattern was incorrect to the img frames I have.

After fixing _image_TSN_cv2_loader for my use I ran the code again and came across this error:

Traceback (most recent call last): File "train_recognizer.py", line 676, in main() File "train_recognizer.py", line 672, in main train(context) File "train_recognizer.py", line 597, in train pred = net(X.astype(opt.dtype, copy=False)) File "C:\Users\Windows\AppData\Roaming\Python\Python36\site-packages\mxnet\gluon\block.py", line 693, in call out = self.forward(args) File "C:\Users\Windows\AppData\Roaming\Python\Python36\site-packages\mxnet\gluon\block.py", line 1158, in forward return self.hybrid_forward(ndarray, x, args, params) File "C:\Users\Windows\AppData\Roaming\Python\Python36\site-packages\gluoncv\model_zoo\action_recognition\actionrec_inceptionv3.py", line 53, in hybrid_forward x = self.features(x) File "C:\Users\Windows\AppData\Roaming\Python\Python36\site-packages\mxnet\gluon\block.py", line 693, in call out = self.forward(args) File "C:\Users\Windows\AppData\Roaming\Python\Python36\site-packages\mxnet\gluon\block.py", line 1158, in forward return self.hybrid_forward(ndarray, x, args, params) File "C:\Users\Windows\AppData\Roaming\Python\Python36\site-packages\mxnet\gluon\nn\basic_layers.py", line 119, in hybrid_forward x = block(x) File "C:\Users\Windows\AppData\Roaming\Python\Python36\site-packages\mxnet\gluon\block.py", line 693, in call out = self.forward(args) File "C:\Users\Windows\AppData\Roaming\Python\Python36\site-packages\mxnet\gluon\block.py", line 1158, in forward return self.hybrid_forward(ndarray, x, args, params) File "C:\Users\Windows\AppData\Roaming\Python\Python36\site-packages\mxnet\gluon\nn\conv_layers.py", line 730, in hybrid_forward return pooling(x, name='fwd', self._kwargs) File "", line 129, in Pooling File "C:\Users\Windows\AppData\Roaming\Python\Python36\site-packages\mxnet_ctypes\ndarray.py", line 107, in _imperative_invoke ctypes.byref(out_stypes))) File "C:\Users\Windows\AppData\Roaming\Python\Python36\site-packages\mxnet\base.py", line 281, in check_call raise MXNetError(py_str(_LIB.MXGetLastError())) mxnet.base.MXNetError: [09:35:41] C:\Jenkins\workspace\mxnet\mxnet\src\operator\nn\pooling.cc:190: Check failed: param.kernel[0] <= dshape_nchw[2] + 2 * param.pad[0]: kernel size (8) exceeds input (5 padded to 5)

I don't understand why the code is looking for parameters in "C:\Jenkins\workspace\mxnet\mxnet\src\operator\nn\pooling.cc" This path is not in my computer.

Is it failing to open something from mxnet because it is looking in the wrong path?

bryanyzhu commented 4 years ago

Ok, it seems that the code is complaining about shape mismatch.

I remember you used InceptionV3 model. That maybe the problem. InceptionV3 accepts 299 input size, not 224. Our default setting is 224. So in your command line, you need to set additional flags,

--new-height 340 --new-width 450 --input-size 299 

This may solve the problem.

Just a quick note, most of the networks work on 224 input size images, only InceptionV3 is an exception. So I suggest to go with standard ones, like VGG16 or ResNet50 to train/test your data. It will be safer.

bryanyzhu commented 4 years ago

Any updates?

nicholasguimaraes commented 4 years ago

Any updates?

Hi Bryan, I was away the last couple of days.

I'm now using "resnet101_v1b_kinetics400" as my feature extractor. I had to edit the dataloader a bit more but now I'm officially training!

--num-data-workers is 0 so training will be much slower. It's taking about 10 minutes per epoch. Is that too much time?

I also realized that num of epochs are just 3. '--num-epochs', type=int, default=3 Is that right?

I have passed epoch 000 and now I'm on 001. Loss is decreasing much slower now.

It saves training info and the model in the folder params/

How do I convert the .params and .states to a model?

bryanyzhu commented 4 years ago

Glad it is working.

10 minutes is not a long time. For example, when I train Kinetics400 (with 240K videos) using 32 workers, it take me 1 hour per epoch.

You can increase num_epochs if you think 3 is too small. I don't know your dataset size. If you dataset is really small, maybe 10 or 20 epochs is enough.

The .param file is the model. You can use it to make predictions.

nicholasguimaraes commented 4 years ago

Glad it is working.

10 minutes is not a long time. For example, when I train Kinetics400 (with 240K videos) using 32 workers, it take me 1 hour per epoch.

You can increase num_epochs if you think 3 is too small. I don't know your dataset size. If you dataset is really small, maybe 10 or 20 epochs is enough.

The .param file is the model. You can use it to make predictions.

In fact it is not taking 10 minutes per epoch, it takes about one hour like you. When I started training I mistakenly thought that each printed line in the command prompt was one epoch.

I'm using the ucf101 dataset and I increased the num of epochs to 30.

How low does the loss get when training with the ucf101 dataset?

I'm on the 4th epoch and so far my accuracy is 14.240000 and loss is 3.569620

bryanyzhu commented 4 years ago

For best results on ucf101, I train 80 epochs. Learning rate is decayed at epoch 40 and 60. The initial learning rate is 0.01. But if training 80 epochs are too long, I think training for 50 epochs is also fine. You can do the lr decay at 30 and 40th epoch. The accuracy won't change much.

The final accuracy should be around 80, and the loss will below 1. You can see my training log at here although this is the log for VGG16.

https://raw.githubusercontent.com/dmlc/web-data/master/gluoncv/logs/action_recognition/ucf101/vgg16_ucf101_tsn.log

nicholasguimaraes commented 4 years ago

Hello Bryan, I'm finishing training in a couple of hours. It's been running for a few days.

It's on epoch 59 and the train accuracy is at 93.5 and the loss is 0.19.

validation: acc-top1=59.736842 acc-top5=78.684211 ( I believe this is because I divided the train/val split myself. I had about 90% of the frames for training and 10% for validation. Is that a bad ratio?

Also, in the test_recognizer.py script, in what line do I import my .params model and pass a video for testing?

bryanyzhu commented 4 years ago

The ratio is good. The low validation accuracy is because you are using resnet101_v1b_kinetics400, which is a very deep model. You are experiencing overfitting. I suggest you use a shallower network, like vgg16_ucf101 or resnet50_v1b_ucf101. You will have better validation accuracy.

The test_recognizer.py is for evaluating your model on the entire dataset. You can use --resume-params YOURMODELNAME to pass in your trained weights.

If you only want to test one video, please consider using inference.py. The tutorial is here. https://gluon-cv.mxnet.io/build/examples_action_recognition/demo_custom.html

nicholasguimaraes commented 4 years ago

Hello Bryan, I tried running inference.py with the model I trained,

python inference.py --data-list ucfTrainTestlist/inference.txt --model resnet101_v1b_kinetics400 --resume-params params/ucf101-resnet101_v1b_kinetics400-059.params

But I was prompt with this,

Traceback (most recent call last): File "inference.py", line 270, in main(logger) File "inference.py", line 227, in main net.collect_params().reset_ctx(context) File "C:\Users\Windows\AppData\Roaming\Python\Python36\site-packages\mxnet\gluon\parameter.py", line 923, in reset_ctx i.reset_ctx(ctx) File "C:\Users\Windows\AppData\Roaming\Python\Python36\site-packages\mxnet\gluon\parameter.py", line 481, in reset_ctx self._init_impl(data, ctx) File "C:\Users\Windows\AppData\Roaming\Python\Python36\site-packages\mxnet\gluon\parameter.py", line 359, in _init_impl self._data = [data.copyto(ctx) for ctx in self._ctx_list] File "C:\Users\Windows\AppData\Roaming\Python\Python36\site-packages\mxnet\gluon\parameter.py", line 359, in self._data = [data.copyto(ctx) for ctx in self._ctx_list] File "C:\Users\Windows\AppData\Roaming\Python\Python36\site-packages\mxnet\ndarray\ndarray.py", line 2646, in copyto return _internal._copyto(self, out=hret) File "", line 27, in _copyto File "C:\Users\Windows\AppData\Roaming\Python\Python36\site-packages\mxnet_ctypes\ndarray.py", line 107, in _imperative_invoke ctypes.byref(out_stypes))) File "C:\Users\Windows\AppData\Roaming\Python\Python36\site-packages\mxnet\base.py", line 281, in check_call raise MXNetError(py_str(_LIB.MXGetLastError())) mxnet.base.MXNetError: [08:16:50] C:\Jenkins\workspace\mxnet\mxnet\src\ndarray\ndarray.cc:1283: GPU is not enabled

The parameter --gpu-id is set to 0 as default.


Also, I tried to start training with the model i3d_nl10_resnet50_v1_kinetics400

But it said the Input data should be 5D and not 4D. To train thei3d networks what else do I have to pass in the .txt file?

bryanyzhu commented 4 years ago

Yes, the inference.py is defaulted to use GPU because using CPU will take too long. But if you prefer to use CPU, you can just change this line of code. https://github.com/dmlc/gluon-cv/blob/master/scripts/action-recognition/inference.py#L196

change context = mx.gpu(gpu_id) to context = mx.cpu()

For using 3D models, you don't need to do anything about the .txt file. The only thing you need to change is the arugment, new_length. new_length means how many frames you are going to use as input. Usually we use 32 frames as input, which means you'll set --new_length 32.

But I still suggest, if you train on UCF101 dataset, a shallower network would be better, such as i3d_resnet50_v1_ucf101. If you use non-local models, it won't bring much performance improvement but will significantly slow down your training.

nicholasguimaraes commented 4 years ago

In fact I used a GeForce RTX 2080 Ti to train the network. I'm using the same exact machine to do inference but it gives me the GPU not enabled error. I tried setting the gpu-id parameter to 1 but it seemed that it could not find it.

I need to be able to do inference on the GPU.

But out of curiosity I tried doing inference on my CPU. My video had 3.000 frames. I ran the code and this was the error: Error occured in reading frames [1000, 1001, 1002, 1003.... 1021] from video boxing_test.mp4 of duration 3000.

I trimmed the video to only 280 frames assuming my video could be corrupted. It still gave me the same error, Error occured in reading frames [124, 125, 126, 127,....155] from video boxing_test.mp4 of duration 280.


And yes thank you for the advice! If I can do inference on the GPU the next model I'll train will be i3d_resnet50_v1_ucf101.

bryanyzhu commented 4 years ago

Interesting, if you have a GPU, then it should be naturally enabled. Can you add CUDA_VISIBLE_DEVICES=0 before python inference.py?

If it doesn't work, try mx.test_utils.list_gpus() to see if mxnet can see your GPU.

For the video reading issue, can you tell me which version of decord you are using? (just type pip show decord and let me know the result.) Please update the version to 0.3.3 and try again.

If you already have the latest version of decord and you are ok with sharing the video, you can send the video to my email (yizhu59@gmail.com) and I will debug for you.

nicholasguimaraes commented 4 years ago

Ok, I changed mx.gpu(gpu_id) to mx.test_utils.list_gpus() and it didn't give me the gpu error which is good!

It gave me the same error with the frames so I'll send you an email with the .mp4.

bryanyzhu commented 4 years ago

I tried your video on my side, it is working fine. Please try the code snippet below to see if it runs through.

import decord
video_name = './boxing_test.mp4'
data = decord.VideoReader(video_name)
frames = data.get_batch(range(124,156))
print(frames.shape)

On my side, this gives me an ndarray of shape (32, 360, 640, 3), which is correct.

nicholasguimaraes commented 4 years ago

I tried your video on my side, it is working fine. Please try the code snippet below to see if it runs through.

import decord
video_name = './boxing_test.mp4'
data = decord.VideoReader(video_name)
frames = data.get_batch(range(124,156))
print(frames.shape)

On my side, this gives me an ndarray of shape (32, 360, 640, 3), which is correct.

I forgot to say that my decord vesion is 0.3.3

Tried running this code but got this error,

module 'decord' has no attribute 'VideoReader'


I'm pasting the full traceback from inference.py below.

INFO:root:Pre-trained model is successfully loaded from the model zoo. Pre-trained model is successfully loaded from the model zoo. INFO:root:Successfully built model resnet101_v1b_kinetics400 Successfully built model resnet101_v1b_kinetics400 INFO:root:Load 1 video samples. Load 1 video samples. Traceback (most recent call last): File "inference.py", line 111, in video_TSN_decord_batch_loader video_data = video_reader.get_batch(frame_id_list).asnumpy() File "C:\Users\Windows\AppData\Roaming\Python\Python36\site-packages\decord\video_reader.py", line 122, in get_batch arr = _CAPI_VideoReaderGetBatch(self._handle, indices) File "C:\Users\Windows\AppData\Roaming\Python\Python36\site-packages\decord_ffi_ctypes\function.py", line 175, in call ctypes.byref(ret_val), ctypes.byref(ret_tcode))) File "C:\Users\Windows\AppData\Roaming\Python\Python36\site-packages\decord_ffi\base.py", line 63, in check_call raise DECORDError(py_str(_LIB.DECORDGetLastError())) decord._ffi.base.DECORDError: [22:48:31] C:\projects\decord-distro-win\decord\src\runtime\ndarray.cc:171: Check failed: from_size == to_size (128 vs. 256) DECORDArrayCopyFromTo: The size must exactly match

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "inference.py", line 270, in main(logger) File "inference.py", line 249, in main video_data = read_data(opt, video_path, transform_test) File "inference.py", line 160, in read_data clip_input = video_TSN_decord_batch_loader(opt, video_name, decord_vr, duration, segment_indices, skip_offsets) File "inference.py", line 114, in video_TSN_decord_batch_loader raise RuntimeError('Error occured in reading frames {} from video {} of duration {}.'.format(frame_id_list, directory, duration)) RuntimeError: Error occured in reading frames [124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155] from boxing_test.mp4 of duration 280.

zhreshold commented 4 years ago

@nicholasguimaraes Verified that it works on windows

>>> len(vr)
226
>>> vr[0]
<decord.NDArray shape=(360, 640, 3), cpu(0)>

I think you might have a broken decord installation. Try uninstall using pip or remove it from the python dist-package, and then install it again.

nicholasguimaraes commented 4 years ago

@nicholasguimaraes Verified that it works on windows

>>> len(vr)
226
>>> vr[0]
<decord.NDArray shape=(360, 640, 3), cpu(0)>

I think you might have a broken decord installation. Try uninstall using pip or remove it from the python dist-package, and then install it again.

I've tried uninstalling and installing but it will give me the error. I've made a new virtual env with all of the lib requirements but also did not work.

The error I get when I run the code snippet suggested by @bryanyzhu is, module 'decord' has no attribute 'VideoReader'

Now the error I get when I try running inference.py is, Error occured in reading frames [124, 125, 126, 127,....155] from video boxing_test.mp4 of duration 280.

Maybe I should try importing the video with cv2?

bryanyzhu commented 4 years ago

As for your case, if the error message is module 'decord' has no attribute 'VideoReader', it means your decord installation is not successful. That's why you get the error when you run inference.py.

Right now, the inference.py does not support cv2 video reader, I will add it later. But of course in the meantime, you can modify on your own. You need to modify this function, to use cv2 instead of decord to read in video frames.

I think your training is successful because you are using image loader to load the data, not the video loader. You already extract video frames from videos. So you didn't use decord when you train your model. But if this is not the case, please let me know.

lgov commented 4 years ago

@bryanyzhu : can you add action recognition inference.py script somewhere in the git repository? The link on the model zoo page now points to _Downloads. I only ask so people can help you improve it. Thx.

bryanyzhu commented 4 years ago

Hi @lgov, the inference.py is in the git repository, the link is at here:

https://github.com/dmlc/gluon-cv/blob/master/scripts/action-recognition/inference.py

And I think the link on the model zoo page also points to the right direction, you can download the script by click on that link. Please let me know if I misunderstand your request.

ghost commented 3 years ago

At this moment, you just need to do two things,

  1. Write a dataloader for your own dataset.
  2. Change the last layer in the model to number of classes in your dataset

I'm working on a tutorial about fine-tuning on custom dataset, will finish in 2 or 3 days. Once it is finished, I will let you know.

hello, what is the link to the video? thank you.

sebyo commented 1 year ago

hello how can I generate those txt files for my own dataset ? I have a data set that only contains videos without any labeling and I want to generate and split it to train and test

bryanyzhu commented 1 year ago

@sebyo For the labels (like 0,1,2,3,...N), you need to manually label the videos and provide the label in the txt file. Usually people label videos into different folders, each folder represents a class. Then you can write very simple python scripts to automatically generate the txt files, and also do any train/test split as you like.

sebyo commented 1 year ago

@bryanyzhu thank you for your answer ! I did as you suggested kow i want to convert those txt files into json files using this script https://github.com/kenshohara/3D-ResNets-PyTorch/blob/master/util_scripts/ucf101_json.py I got this error pandas.errors.parsererror error tokenizing data. c error expected 2 fields didn't quiet understand the problem and they dont seem to answer anymore there and I got the same file structure as they did

bryanyzhu commented 1 year ago

I didn't look into their code, but I assume this is because their txt input has two items each line, just like the original UCF101 annotation files, ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c01.avi 1. But our code has three items in each line, ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c01.avi 300 1, the middle one represents number of frames. That is probably the error why it expect 2 fields. If you are using their codebase, you can try to remove the second column and see if it works.

sebyo commented 1 year ago

@bryanyzhu I am sorry I have a question not related to this issue but I am trying to display data after performin data augmentation and load it with pytorch dataloader but didn't work train_data = get_training_data(opt.video_path, opt.annotation_path, opt.dataset, opt.input_type, opt.file_type, spatial_transform, temporal_transform) the spatial_transform and temporal_transform are the augmentation method and then : train_loader= torch.utils.data.DataLoader(train_data, batch_size=opt.batch_size, shuffle=(train_sampler is None), num_workers=opt.n_threads, pin_memory=True, sampler=train_sampler, worker_init_fn=worker_init_fn)

train loader takes the train_data ,I want to display it using this code :

`
for batch in train_loader:

          inputs, targets = batch

      for img in inputs:

        image  = img.cpu().numpy()

       # transpose image to fit plt input

        image = image.T

       # normalise image

         data_min = np.min(image, axis=(1,2), keepdims=True)

         data_max = np.max(image, axis=(1,2), keepdims=True)

        scaled_data = (image - data_min) / (data_max - data_min)

      # show image

         plt.imshow(scaled_data)

          plt.show()`

but didn't work any idea that can help me ?