dmlc / gluon-cv

Gluon CV Toolkit
http://gluon-cv.mxnet.io
Apache License 2.0
5.84k stars 1.22k forks source link

Action Recognition train on own Dataset #1347

Closed Tom-89p13 closed 4 years ago

Tom-89p13 commented 4 years ago

Hello,

im trying to train Action Recognition models on my own Dataset. Following the Fine-Tuning Tutorial on your Site, but got some Issues. I´ve got only 2 classes and for each class around 100 videos in .mp4 Format (last about 10 seconds) and created Train.txt. My Code is below.

  1. My train accuracy doesnt increase and stay on 0.477.. (just for testing set epoch on 2, but still no increase if epoch > 10) [Epoch 0] train=0.477273 loss=0.138171 time: 388.277470 [Epoch 1] train=0.477273 loss=0.002957 time: 104.294971

  2. If i´ll try to test my params they will be following error. Commandline: !python test_recognizer.py --model i3d_resnet50_v1_custom --resume-params net2.params --num-classes 2 --num-gpus 1

Error: Pre-trained model net2.params is successfully loaded. Successfully loaded model i3d_resnet50_v1_custom Load 10 test samples in 0 iterations. Test accuracy: acc-top1=nan acc-top5=nan Total evaluation time is 0.00 minutes

Thank you very much for your help !

Greetings,

Video loading

num_gpus = 1 ctx = [mx.gpu(i) for i in range(num_gpus)] transform_train = video.VideoGroupTrainTransform(size=(224, 224)), scale_ratios=[1.0, 0.8], mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])

per_device_batch_size = 5 num_workers = 0 batch_size = per_device_batch_size * num_gpus

train_dataset = VideoClsCustom(root=os.path.expanduser('/content/drive/My Drive/Datenbank/Train'), #directory Videos setting=os.path.expanduser('/content/drive/My Drive/Datenbank/Train.txt'), #Path Train.txt train=True, new_length=32,

slowfast=True,

                           video_loader=True,
                           use_decord=True,
                           #slow_temporal_stride=16,
                           #fast_temporal_stride=2,
                           transform=transform_train)

print('Load %d training samples.' % len(train_dataset)) train_data = gluon.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers=num_workers)

Loading Model

net = get_model(name='i3d_resnet50_v1_custom', nclass=2) net.collect_params().reset_ctx(ctx) print(net)

Some configurations

lr_decay = 0.1 lr_decay_epoch = [40, 80, 100] optimizer = 'sgd' optimizer_params = {'learning_rate': 0.001, 'wd': 0.0001, 'momentum': 0.9}

trainer = gluon.Trainer(net.collect_params(), optimizer, optimizer_params)

loss_fn = gluon.loss.SoftmaxCrossEntropyLoss()

train_metric = mx.metric.Accuracy() train_history = TrainingHistory(['training-acc'])

Train Model

epochs = 2 lr_decay_count = 1

for epoch in range(epochs): tic = time.time() train_metric.reset() train_loss = 0

if epoch == lr_decay_epoch[lr_decay_count]:
    trainer.set_learning_rate(trainer.learning_rate*lr_decay)
    lr_decay_count += 1

for i, batch in enumerate(train_data):
    # Extract data and label
    data = split_and_load(batch[0], ctx_list=ctx, batch_axis=0)
    label = split_and_load(batch[1], ctx_list=ctx, batch_axis=0)

    # AutoGrad
    with ag.record():
        output = []
        for _, X in enumerate(data):
            X = X.reshape((-1,) + X.shape[2:])
            pred = net(X)
            output.append(pred)
        loss = [loss_fn(yhat, y) for yhat, y in zip(output, label)]

    # Backpropagation
    for l in loss:
        l.backward()

    # Optimize
    trainer.step(batch_size)

    # Update metrics
    train_loss += sum([l.mean().asscalar() for l in loss])
    train_metric.update(label, output)

    if i == 100:
        break

name, acc = train_metric.get()

# Update history and print metrics
train_history.update([acc])
print('[Epoch %d] train=%f loss=%f time: %f' %
    (epoch, acc, train_loss / (i+1), time.time()-tic))

train_history.plot()

Saving params

file_name = "net2.params" net.save_parameters(file_name)

Tom-89p13 commented 4 years ago

PS: Video FPS of Dataset is 60. Is that a problem? or need 25?

bryanyzhu commented 4 years ago

I don't think fps is an issue, but let me look into it.

bryanyzhu commented 4 years ago

I tried to use 2 classes from UCF101 dataset as an example, any everything looks normal on my end. Maybe this is data-dependent.

Could you try TSN model with resnet50(https://github.com/dmlc/gluon-cv/blob/master/gluoncv/model_zoo/action_recognition/actionrec_resnetv1b.py#L651)? You can change dropout to a larger value (like 0.9) and init_std to 0.001.

For dataloader, just set new_length=1 and num_segments=3 (or larger value if your video is long). Usually I use TSN as a quick way to debug.

Tom-89p13 commented 4 years ago

First thanks for you help @bryanyzhu, i really appreciate that.

I changed the parameter for dataloader new_length=1, but num_segments=3 will always crash, this has to be 1 for my dataset Then I changed the model to TSN resnet50 with the code below: net = get_model(name='resnet50_v1b_custom', nclass=2, dropout=0.9, init_std=0.001) net.collect_params().reset_ctx(ctx)

Afterwards I trained the model and got following results: [Epoch 0] train=0.477778 loss=0.108409 time: 35.025032 [Epoch 1] train=0.500000 loss=0.000349 time: 35.650740 [Epoch 2] train=0.500000 loss=0.000173 time: 37.308505 [Epoch 3] train=0.500000 loss=0.000137 time: 36.339093 [Epoch 4] train=0.500000 loss=0.000151 time: 37.180737

train value should be near 1 right? and not 0.5?

Again if i run the command below i got NaN as result: !python test_recognizer.py --model resnet50_v1b_custom --resume-params net2.params --num-classes 2 --num-gpus 1

Result: Namespace(batch_norm=False, batch_size=32, benchmark=False, calib_mode='naive', calibration=False, crop_ratio=0.875, data_aug='v1', data_dir='/content/drive/My Drive/Datenbank/Trainpaket1', dataset='ucf101', deploy=False, dtype='float32', eval=False, fast_temporal_stride=2, hard_weight=0.5, hashtag='', input_5d=False, input_size=224, label_smoothing=False, last_gamma=False, log_interval=50, logging_file='train.log', lr=0.1, lr_decay=0.1, lr_decay_epoch='40,60', lr_decay_period=0, lr_mode='step', mixup=False, mixup_alpha=0.2, mixup_off_epoch=0, mode=None, model='resnet50_v1b_custom', model_prefix=None, momentum=0.9, new_height=256, new_length=1, new_step=1, new_width=340, no_wd=False, num_calib_batches=5, num_classes=2, num_crop=1, num_epochs=3, num_gpus=1, num_iterations=100, num_segments=1, num_workers=4, prefetch_ratio=2.0, quantized=False, quantized_dtype='auto', resume_epoch=0, resume_params='net2.params', resume_states='', save_dir='params', save_frequency=10, slow_temporal_stride=16, slowfast=False, teacher=None, temperature=20, ten_crop=False, three_crop=False, train_list='/content/drive/My Drive/Datenbank/Train.txt', use_amp=False, use_decord=False, use_gn=False, use_pretrained=False, use_se=False, use_softmax=False, use_tsn=False, val_data_dir='/content/drive/My Drive/Datenbank/Trainpaket1', val_list='/content/drive/My Drive/Datenbank/Val.txt', video_loader=False, warmup_epochs=0, warmup_lr=0.0, wd=0.0001) Total batch size is set to 32 on 1 GPUs Pre-trained model net2.params is successfully loaded. Successfully loaded model resnet50_v1b_custom Load 10 test samples in 0 iterations. Test accuracy: acc-top1=nan acc-top5=nan Total evaluation time is 0.00 minutes

Val.txt looks like: Montage1.MP4 200 1 Montage2.MP4 200 1 Montage3.MP4 200 1 Montage4.MP4 200 1 Montage5.MP4 200 1 Demontage1.MP4 200 2 Demontage2.MP4 200 2 Demontage3.MP4 200 2 Demontage4.MP4 200 2 Demontage5.MP4 200 2

Train.txt looks like: Montage6.MP4 200 1 Montage7.MP4 200 1 Montage8.MP4 200 1 Montage9.MP4 200 1 Montage10.MP4 200 1 Montage11.MP4 200 1 Montage12.MP4 200 1 Montage13.MP4 200 1 Montage14.MP4 200 1 and more ...

Greetings, Tom

bryanyzhu commented 4 years ago

Hi, I didn't check into details yet, but it seems your labels are 1 and 2? If this is the case, please change it to 0 and 1, because labels should start from 0.

Tom-89p13 commented 4 years ago

Hello Yi Zhu, oh my god, such an easy solution .. first problem solved, it works with labels 0 and 1 ! :-)

[Epoch 0] train=0.488889 loss=0.687800 time: 950.778338 [Epoch 1] train=0.733333 loss=0.488430 time: 88.430673 [Epoch 2] train=0.922222 loss=0.196962 time: 48.234062 [Epoch 3] train=0.966667 loss=0.093875 time: 44.105297 [Epoch 4] train=0.977778 loss=0.049598 time: 44.734219

But the test_recognizer.py still get nan as result.

And in inference.py predicts every video as class 1 :-( . -> fixed that by changing use-pretrained to false 👍

PS: I´m using MXNet Version (cu101 on 1.6.0), GluonVC Version (0.8.0b20200629). If im using GluonVC Version 0.7.0 im getting that Error while running test_recognizer :
TypeError: init() got an unexpected keyword argument 'data_aug'

Greetings, Tom

bryanyzhu commented 4 years ago

Great to know the first problem solved. Let me work on the NaN problem.

Tom-89p13 commented 4 years ago

Any updates sir?

I´ve got 3 additional questions:

-> i´ve got 2 classes, but if the input video includes NONE of these 2 classes it should output i.e. "none", is that possbile? Because every video is predicted to class 0 or 1, even if there is no class in the video

->can i ouput the threshold or the probability of the predicitons? (and change the threshhold/probability?)

-> can i output a validation loss? or is there no validiation?

Thank you in advance.

Greeetings.

bryanyzhu commented 4 years ago

Hi sorry about the delay, I didn't get time to look into this yet.

For your new question, yes, you can achieve your goal by manipulating the probability. Actually the output from the network is the class probabilities.

https://github.com/dmlc/gluon-cv/blob/master/scripts/action-recognition/test_recognizer.py#L203

In your case which is a binary classification problem, the output pred will be a 2-dim vector, indicating the probabilities for the two classes. You can set a threshold to achieve your goal. For example, set the threshold to 0.7. If the first class's probability is larger than 0.7, predict class 0. If the second class's probability is larger than 0.7, predict class 1. If both probability are close (like 0.5 and 0.5), it means the model is confusing, you can predict None. You can adjust the threshold depending on your use case.

Tom-89p13 commented 4 years ago

Hey, no problem.

Thank you really much for your superfast answer. I will work into it and let you now about my results.

bryanyzhu commented 4 years ago

Just let you know, I checked test.py on my end. I tried testing on both UCF101 and Kinetics400 dataset. I didn't see NaN problem. Without reproducing your error, I can't debug into it.

So either (1) please try UCF101 dataset and see if the code actually has bugs; or (2) please try to find out where the NaN comes from. (3) You can also try using inference.py to make predictions for each video and see what you get. Thank you.

Tom-89p13 commented 4 years ago

ok thanks, i´ll try UCF101 dataset and try to find out where NaN come from.

Tom-89p13 commented 4 years ago

Well i figured out where the problem was. I just wrote 10 test samples in val.txt but batch_size was 32.

But now it doesn´t load video samples. If i use configuration "ucf101" in code below, it causes "RuntimeError: Could not load file /content/drive/My Drive/Datenbank/VideosTrain/Trainpaket1_Montage41.MP4/img_00085.jpg starting at frame 85. Check data path."

If i use kinetics400, hmdb51 it causes "ValueError: kth(=-3) out of bounds (2)"

What configurations in val_dataset should i use for custom data? https://github.com/dmlc/gluon-cv/blob/master/scripts/action-recognition/test_recognizer.py#L361

Inference.py works perfectly, but i want to use test_recognizer for the whole dataset. Thanks and greetings.

bryanyzhu commented 4 years ago

We support two ways of dataloading, one is directly loading videos, the other is loading video frames. In our test_recognizer.py, ucf101 is loading frames and kinetics400 is loading videos (see parameter video_loader for more details).

I think the reason why you get RuntimeError: Could not load file /content/drive/My Drive/Datenbank/VideosTrain/Trainpaket1_Montage41.MP4/img_00085.jpg starting at frame 85. Check data path." is, you are using frame loading to load videos. I mean, if you are loading frames, it should be like Trainpaket1_Montage41/img_00085.jpg. If you are loading videos, it should be like Trainpaket1_Montage41.MP4. Hope this is clear. So if you are loading videos directly, you need to set video_loader=True and use_decord=True.

When you use kinetics400, it is loading videos, so you didn't get that runtime error. I don't know why you get "ValueError: kth(=-3) out of bounds (2), I never get this error. I think it is still data-dependent issue.

Tom-89p13 commented 4 years ago

Ok, thanks for your help. Learned a lot of it.

The problem was in the definition of number classes, so you´re right it was data/params-dependent issue.

If I use "--model I3D_resnet50_v1_kinetics400" the Error doesn´t occur. If I use my pretrained model "--model I3D_resnet50_v1_custom --num-classes 2" the Error occurs. I only got 2 classes, so the top5-acc can´t be calculated and gets out the Error. So i need to get more classes or hide top5-acc. :-)