anita-hu / MSAF

Offical implementation of paper "MSAF: Multimodal Split Attention Fusion"
MIT License
76 stars 9 forks source link

Eval accuracy slightly different #1

Closed cheolhwanyoo closed 3 years ago

cheolhwanyoo commented 3 years ago

Hi. Thank you so much for providing useful code.

When I tested NTU-RGBD datset through the command provided 'python main_msaf.py --datadir <path/to/NTU> \ --checkpointdir checkpoints \ --test_cp msaf_ntu_epoch12_92.24.checkpoint \ --no_bad_skel'

, the performance was slightly different and 91.38 instead of 92.24. Am I missing something? Thank you.

kevinsu628 commented 3 years ago

Hello,

I redownloaded the checkpoint and evaluated it, and I am able to get 92.24%. Could you double-check that the dataset was processed correctly and make sure it is "NTU RGB+D" (60 classes) instead of "NTU RGB+D 120"?

cheolhwanyoo commented 3 years ago

Hi. Thanks for reply I re-downloaded NTU dataset and followed bash command you suggested to process avi files. I tried several times but the performance is still different.

Acc Multimodal: 0.9115, Acc Visual: 0.8740, Acc Skeleton: 0.8441

The only change I made in the code is line 117 in MSAF.py. sx_chunk = (att.view(list(att.size()) + ns [1])) -> sx_chunk = sx_chunk (att.view(list(att.size()) + ns [1]))

because the original code raised an following error.

RuntimeError: diff_view_meta->outputnr == 0 INTERNAL ASSERT FAILED at "/pytorch/torch/csrc/autograd/variable.cpp":363, please report a bug to PyTorch.

Could this code change have any effect on performance?

My working environment and test args are as follows

PyTorch version: 1.7.1+cu110 Is debug build: False CUDA used to build PyTorch: 11.0 ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.1 LTS (x86_64) GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 Clang version: Could not collect CMake version: version 3.16.3

Python version: 3.8 (64-bit runtime) Is CUDA available: True CUDA runtime version: Could not collect GPU models and configuration: GPU 0: GeForce RTX 3090 GPU 1: GeForce RTX 3090

Nvidia driver version: 455.38 cuDNN version: Could not collect HIP runtime version: N/A MIOpen runtime version: N/A

Versions of relevant libraries: [pip3] numpy==1.19.2 [pip3] torch==1.7.1+cu110 [pip3] torchaudio==0.7.2 [pip3] torchgeometry==0.1.2 [pip3] torchvision==0.8.2+cu110 [conda] blas 1.0
[conda] cudatoolkit 11.0.221
[conda] mkl 2020.2
[conda] mkl-service 2.3.0
[conda] mkl_fft 1.2.0
[conda] mkl_random 1.1.1
[conda] numpy 1.19.2
[conda] numpy-base 1.19.2
[conda] torch 1.7.1+cu110
[conda] torchaudio 0.7.2
[conda] torchgeometry 0.1.2
[conda] torchvision 0.8.2+cu110

Namespace(batchsize=4, checkpointdir='checkpoints', datadir='dataset', drpt=0.0, epochs=20, fc_final_preds=False, modality='both', multitask=True, no_bad_skel=True, no_norm=False, num_outputs=60, num_workers=16, rgb_cp='i3d_32frames_85.63.checkpoint', rgb_net='i3d', ske_cp='skeleton_32frames_85.24.checkpoint', test_cp='', train=False, use_dataparallel=False, verbose=True, vid_len=(32, 32))

Thank you.

anita-hu commented 3 years ago

Hi,

Looks like your environment is quite different from what we used. Try setting up a conda environment using the environment.yml file provided with conda env create -f environment.yml

cheolhwanyoo commented 3 years ago

Hi,

Due to the compatibility between RTX 3090 and cuda 10.1, I cannot use environment in environment.yml May i ask you to change the code as mentioned above and see if the performance is the same?

sx_chunk = (att.view(list(att.size()) + ns [1])) -> sx_chunk = sx_chunk (att.view(list(att.size()) + ns [1]))

becuase inplace operation looks like have some issues about gradient computation https://github.com/pytorch/pytorch/issues/46820

anita-hu commented 3 years ago

Hi,

The observed accuracy drop is from the code change since the model is trained from the inplace implementation (similar to the issue you linked). We will replace inplace operations to support pytorch 1.7 and provide new model weights in our next update.

cheolhwanyoo commented 3 years ago

Thanks for reply. your paper and code really helps understanding of this field.

But i found something weird while training and testing. While training and testing NTU datasets, split_block in MSAF module is set to 1 and in the code below, the values of input feature X and output of the MSAF module 'ret' seems to be always identical.

if self.split_block == 1: ret = self.blocks0 return ret

Did i misunderstand or do something wrong? Thank you.

kevinsu628 commented 3 years ago

Hi,

The parameter split_block is a parameter we proposed for enhancing sequential features. Please see Figure 2 in our paper, split_block is the parameter q. We only set q in sentiment analysis (q is default 1) as both action recognition and emotion recognition use CNNs. This is explained in more detail in section 3.2 of our paper.

If split_block is 1, the number of splits time-wise is 1 from the init, thus we can confidently use the first MSAF block.

Otherwise, we split the feature map "time-wise" and pass to different MSAF blocks which is what's after line 137.

Please don't hesitate to ask any other questions

anita-hu commented 3 years ago

@cheolhwanyoo I've updated the code and was able to obtain the same accuracy results for NTU using the previous model weights. Try using the branch in the PR

cheolhwanyoo commented 3 years ago

Thank you. I'll try it. Happy new year