Closed Tom-89p13 closed 4 years ago
PS: Video FPS of Dataset is 60. Is that a problem? or need 25?
I don't think fps is an issue, but let me look into it.
I tried to use 2 classes from UCF101 dataset as an example, any everything looks normal on my end. Maybe this is data-dependent.
Could you try TSN model with resnet50(https://github.com/dmlc/gluon-cv/blob/master/gluoncv/model_zoo/action_recognition/actionrec_resnetv1b.py#L651)? You can change dropout
to a larger value (like 0.9) and init_std
to 0.001.
For dataloader, just set new_length=1
and num_segments=3
(or larger value if your video is long). Usually I use TSN as a quick way to debug.
First thanks for you help @bryanyzhu, i really appreciate that.
I changed the parameter for dataloader new_length=1, but num_segments=3 will always crash, this has to be 1 for my dataset Then I changed the model to TSN resnet50 with the code below: net = get_model(name='resnet50_v1b_custom', nclass=2, dropout=0.9, init_std=0.001) net.collect_params().reset_ctx(ctx)
Afterwards I trained the model and got following results: [Epoch 0] train=0.477778 loss=0.108409 time: 35.025032 [Epoch 1] train=0.500000 loss=0.000349 time: 35.650740 [Epoch 2] train=0.500000 loss=0.000173 time: 37.308505 [Epoch 3] train=0.500000 loss=0.000137 time: 36.339093 [Epoch 4] train=0.500000 loss=0.000151 time: 37.180737
train value should be near 1 right? and not 0.5?
Again if i run the command below i got NaN as result: !python test_recognizer.py --model resnet50_v1b_custom --resume-params net2.params --num-classes 2 --num-gpus 1
Result: Namespace(batch_norm=False, batch_size=32, benchmark=False, calib_mode='naive', calibration=False, crop_ratio=0.875, data_aug='v1', data_dir='/content/drive/My Drive/Datenbank/Trainpaket1', dataset='ucf101', deploy=False, dtype='float32', eval=False, fast_temporal_stride=2, hard_weight=0.5, hashtag='', input_5d=False, input_size=224, label_smoothing=False, last_gamma=False, log_interval=50, logging_file='train.log', lr=0.1, lr_decay=0.1, lr_decay_epoch='40,60', lr_decay_period=0, lr_mode='step', mixup=False, mixup_alpha=0.2, mixup_off_epoch=0, mode=None, model='resnet50_v1b_custom', model_prefix=None, momentum=0.9, new_height=256, new_length=1, new_step=1, new_width=340, no_wd=False, num_calib_batches=5, num_classes=2, num_crop=1, num_epochs=3, num_gpus=1, num_iterations=100, num_segments=1, num_workers=4, prefetch_ratio=2.0, quantized=False, quantized_dtype='auto', resume_epoch=0, resume_params='net2.params', resume_states='', save_dir='params', save_frequency=10, slow_temporal_stride=16, slowfast=False, teacher=None, temperature=20, ten_crop=False, three_crop=False, train_list='/content/drive/My Drive/Datenbank/Train.txt', use_amp=False, use_decord=False, use_gn=False, use_pretrained=False, use_se=False, use_softmax=False, use_tsn=False, val_data_dir='/content/drive/My Drive/Datenbank/Trainpaket1', val_list='/content/drive/My Drive/Datenbank/Val.txt', video_loader=False, warmup_epochs=0, warmup_lr=0.0, wd=0.0001) Total batch size is set to 32 on 1 GPUs Pre-trained model net2.params is successfully loaded. Successfully loaded model resnet50_v1b_custom Load 10 test samples in 0 iterations. Test accuracy: acc-top1=nan acc-top5=nan Total evaluation time is 0.00 minutes
Val.txt looks like: Montage1.MP4 200 1 Montage2.MP4 200 1 Montage3.MP4 200 1 Montage4.MP4 200 1 Montage5.MP4 200 1 Demontage1.MP4 200 2 Demontage2.MP4 200 2 Demontage3.MP4 200 2 Demontage4.MP4 200 2 Demontage5.MP4 200 2
Train.txt looks like: Montage6.MP4 200 1 Montage7.MP4 200 1 Montage8.MP4 200 1 Montage9.MP4 200 1 Montage10.MP4 200 1 Montage11.MP4 200 1 Montage12.MP4 200 1 Montage13.MP4 200 1 Montage14.MP4 200 1 and more ...
Greetings, Tom
Hi, I didn't check into details yet, but it seems your labels are 1 and 2? If this is the case, please change it to 0 and 1, because labels should start from 0.
Hello Yi Zhu, oh my god, such an easy solution .. first problem solved, it works with labels 0 and 1 ! :-)
[Epoch 0] train=0.488889 loss=0.687800 time: 950.778338 [Epoch 1] train=0.733333 loss=0.488430 time: 88.430673 [Epoch 2] train=0.922222 loss=0.196962 time: 48.234062 [Epoch 3] train=0.966667 loss=0.093875 time: 44.105297 [Epoch 4] train=0.977778 loss=0.049598 time: 44.734219
But the test_recognizer.py still get nan as result.
And in inference.py predicts every video as class 1 :-( . -> fixed that by changing use-pretrained to false 👍
PS:
I´m using MXNet Version (cu101 on 1.6.0), GluonVC Version (0.8.0b20200629).
If im using GluonVC Version 0.7.0 im getting that Error while running test_recognizer :
TypeError: init() got an unexpected keyword argument 'data_aug'
Greetings, Tom
Great to know the first problem solved. Let me work on the NaN
problem.
Any updates sir?
I´ve got 3 additional questions:
-> i´ve got 2 classes, but if the input video includes NONE of these 2 classes it should output i.e. "none", is that possbile? Because every video is predicted to class 0 or 1, even if there is no class in the video
->can i ouput the threshold or the probability of the predicitons? (and change the threshhold/probability?)
-> can i output a validation loss? or is there no validiation?
Thank you in advance.
Greeetings.
Hi sorry about the delay, I didn't get time to look into this yet.
For your new question, yes, you can achieve your goal by manipulating the probability. Actually the output from the network is the class probabilities.
https://github.com/dmlc/gluon-cv/blob/master/scripts/action-recognition/test_recognizer.py#L203
In your case which is a binary classification problem, the output pred
will be a 2-dim vector, indicating the probabilities for the two classes. You can set a threshold to achieve your goal. For example, set the threshold to 0.7. If the first class's probability is larger than 0.7, predict class 0. If the second class's probability is larger than 0.7, predict class 1. If both probability are close (like 0.5 and 0.5), it means the model is confusing, you can predict None
. You can adjust the threshold depending on your use case.
Hey, no problem.
Thank you really much for your superfast answer. I will work into it and let you now about my results.
Just let you know, I checked test.py
on my end. I tried testing on both UCF101 and Kinetics400 dataset. I didn't see NaN
problem. Without reproducing your error, I can't debug into it.
So either (1) please try UCF101 dataset and see if the code actually has bugs; or (2) please try to find out where the NaN comes from. (3) You can also try using inference.py
to make predictions for each video and see what you get. Thank you.
ok thanks, i´ll try UCF101 dataset and try to find out where NaN come from.
Well i figured out where the problem was. I just wrote 10 test samples in val.txt but batch_size was 32.
But now it doesn´t load video samples. If i use configuration "ucf101" in code below, it causes "RuntimeError: Could not load file /content/drive/My Drive/Datenbank/VideosTrain/Trainpaket1_Montage41.MP4/img_00085.jpg starting at frame 85. Check data path."
If i use kinetics400, hmdb51 it causes "ValueError: kth(=-3) out of bounds (2)"
What configurations in val_dataset should i use for custom data? https://github.com/dmlc/gluon-cv/blob/master/scripts/action-recognition/test_recognizer.py#L361
Inference.py works perfectly, but i want to use test_recognizer for the whole dataset. Thanks and greetings.
We support two ways of dataloading, one is directly loading videos, the other is loading video frames. In our test_recognizer.py
, ucf101
is loading frames and kinetics400
is loading videos (see parameter video_loader
for more details).
I think the reason why you get RuntimeError: Could not load file /content/drive/My Drive/Datenbank/VideosTrain/Trainpaket1_Montage41.MP4/img_00085.jpg starting at frame 85. Check data path."
is, you are using frame loading to load videos. I mean, if you are loading frames, it should be like Trainpaket1_Montage41/img_00085.jpg
. If you are loading videos, it should be like Trainpaket1_Montage41.MP4
. Hope this is clear. So if you are loading videos directly, you need to set video_loader=True
and use_decord=True
.
When you use kinetics400
, it is loading videos, so you didn't get that runtime error. I don't know why you get "ValueError: kth(=-3) out of bounds (2)
, I never get this error. I think it is still data-dependent issue.
Ok, thanks for your help. Learned a lot of it.
The problem was in the definition of number classes, so you´re right it was data/params-dependent issue.
If I use "--model I3D_resnet50_v1_kinetics400" the Error doesn´t occur. If I use my pretrained model "--model I3D_resnet50_v1_custom --num-classes 2" the Error occurs. I only got 2 classes, so the top5-acc can´t be calculated and gets out the Error. So i need to get more classes or hide top5-acc. :-)
Hello,
im trying to train Action Recognition models on my own Dataset. Following the Fine-Tuning Tutorial on your Site, but got some Issues. I´ve got only 2 classes and for each class around 100 videos in .mp4 Format (last about 10 seconds) and created Train.txt. My Code is below.
My train accuracy doesnt increase and stay on 0.477.. (just for testing set epoch on 2, but still no increase if epoch > 10) [Epoch 0] train=0.477273 loss=0.138171 time: 388.277470 [Epoch 1] train=0.477273 loss=0.002957 time: 104.294971
If i´ll try to test my params they will be following error. Commandline: !python test_recognizer.py --model i3d_resnet50_v1_custom --resume-params net2.params --num-classes 2 --num-gpus 1
Error: Pre-trained model net2.params is successfully loaded. Successfully loaded model i3d_resnet50_v1_custom Load 10 test samples in 0 iterations. Test accuracy: acc-top1=nan acc-top5=nan Total evaluation time is 0.00 minutes
Thank you very much for your help !
Greetings,
Video loading
num_gpus = 1 ctx = [mx.gpu(i) for i in range(num_gpus)] transform_train = video.VideoGroupTrainTransform(size=(224, 224)), scale_ratios=[1.0, 0.8], mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
per_device_batch_size = 5 num_workers = 0 batch_size = per_device_batch_size * num_gpus
train_dataset = VideoClsCustom(root=os.path.expanduser('/content/drive/My Drive/Datenbank/Train'), #directory Videos setting=os.path.expanduser('/content/drive/My Drive/Datenbank/Train.txt'), #Path Train.txt train=True, new_length=32,
slowfast=True,
print('Load %d training samples.' % len(train_dataset)) train_data = gluon.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers=num_workers)
Loading Model
net = get_model(name='i3d_resnet50_v1_custom', nclass=2) net.collect_params().reset_ctx(ctx) print(net)
Some configurations
lr_decay = 0.1 lr_decay_epoch = [40, 80, 100] optimizer = 'sgd' optimizer_params = {'learning_rate': 0.001, 'wd': 0.0001, 'momentum': 0.9}
trainer = gluon.Trainer(net.collect_params(), optimizer, optimizer_params)
loss_fn = gluon.loss.SoftmaxCrossEntropyLoss()
train_metric = mx.metric.Accuracy() train_history = TrainingHistory(['training-acc'])
Train Model
epochs = 2 lr_decay_count = 1
for epoch in range(epochs): tic = time.time() train_metric.reset() train_loss = 0
train_history.plot()
Saving params
file_name = "net2.params" net.save_parameters(file_name)