fabiozappo / SkeletonGroupActivityRecognition

Learning Group Activities from Skeletons without Individual Action Labels
Apache License 2.0
21 stars 4 forks source link

Reproducing ICPR paper results #4

Closed daniel-richter closed 3 years ago

daniel-richter commented 3 years ago

First: Thanks for sharing the code of your paper and for dockerizing the solition so it's easy to run the algorithms myself!

Running train.py gives me the following output before finishing:

Training complete in 58m 17s Best val Acc: 0.879581 Confusion Matrix: tensor([[155, 9, 21, 0, 0, 5, 0, 2], [ 7, 153, 8, 1, 0, 2, 2, 0], [ 15, 2, 185, 1, 1, 3, 3, 0], [ 0, 0, 0, 82, 5, 0, 0, 0], [ 0, 0, 0, 5, 97, 0, 0, 0], [ 0, 3, 5, 0, 0, 203, 4, 11], [ 1, 2, 2, 0, 0, 7, 158, 9], [ 1, 1, 2, 0, 0, 18, 3, 143]]) Class accuracies: tensor([0.8073, 0.8844, 0.8810, 0.9425, 0.9510, 0.8982, 0.8827, 0.8512]) MCA: tensor(0.8796) MPCA: tensor(0.8873) Best Acc Persons: 0.29981965734896304 Best Acc Groups: 0.8795811518324607

How should I interpret these data resp. how do these results relate to the findings in your ICPR paper (eg. Table 1 & Table 2)?

fabiozappo commented 3 years ago

If you run python train.py --help inside the container you can get all available options to reproduce paper result.

For example, supposing you want to reproduce the following table: image

Group activity labels only with data augmentation: python train.py --augment --pivot --loss_balancer 0

Pseudo action labels from 3D-Resnet with augmentation: python train.py --augment --pivot --pseudo_labels

Supervised with data augmentation: python train.py --augment --pivot

daniel-richter commented 3 years ago

Hm, it seems there is no argument loss_factor...

usage: train.py [-h] [--pseudo_labels] [--num_clusters NUM_CLUSTERS] [--augment] [--epochs EPOCHS] [--batch_size BATCH_SIZE] [--workers WORKERS] [--loss_balancer LOSS_BALANCER] [--pivot_distances]
train.py: error: unrecognized arguments: --loss_factor 0
fabiozappo commented 3 years ago

Sorry for the mistake, I correctly updated previous comment

daniel-richter commented 3 years ago

Thanks a lot! The values in your table are taken from Best val Acc? What about 2D-Vgg16?

fabiozappo commented 3 years ago

Thanks a lot! The values in your table are taken from Best val Acc?

Yes

What about 2D-Vgg16?

In this repo I only pushed end-to-end code and 3d-resnet feature extraction

daniel-richter commented 3 years ago

I had the chance to run the scripts.

Results from ICPR paper: image

Our results: Method No Data Aug. With Data Aug.
Group activity labels only 84.74 86.01
Pseudo action labels from 3D-Resnet 86.46 86.99
Supervised 87.36 88.56

Do the numbers look reasonable?

fabiozappo commented 3 years ago

Numbers look reasonable cause how you can see clustering visual features from actors and using them as pseudo-labels helps the model to learn the group activity as explained in the paper. Difference in results from the paper are probably caused by a different version and settings of openpose, as in the container I'm using a previous version of openpose that maybe is less accured in skeleton prediction than the one used in paper experiments.

If you're able to create a container that is using last openpose version let me know!

daniel-richter commented 2 years ago

With cuda:10.0-cudnn7 and the latest OpenPose version (https://github.com/fabiozappo/SkeletonGroupActivityRecognition/pull/6) I get the following results for the first run I tried:

Method No Data Aug. With Data Aug.
Group activity labels only ~84.74~ 85.48 ~86.01~ 87.43
Pseudo action labels from 3D-Resnet ~86.46~ 86.68 ~86.99~ 87.35
Supervised ~87.36~ 87.58 ~88.56~ 89.45

But the numbers aren't that stable if I execute the scripts multiple times:

Method Data aug. Run1 Run 2 Run 3 Run 4 Run 5 Run 5 Run 5
Group activity labels w/o 84.1 83.5 77.5 84.0 84.1 80.3 81.9
Group activity labels only with 85.2 83.6 82.6 86.0 83.1 83.4 84.7
Pseudo action labels from 3D-Resnet w/o 84.3 82.8 80.9 77.8 82.0 80.5
Pseudo action labels from 3D-Resnet with 77.7 85.3 85.0 83.8 83.5 83.8 82.6
Supervised w/o 86.2 84.0 84.1 85.6 87.9 86.5
Supervised with 86.8 86.5 87.8 87.2 87.6 84.7