DirtyHarryLYL / Transferable-Interactiveness-Network

Code for Transferable Interactiveness Knowledge for Human-Object Interaction Detection. (CVPR'19, TPAMI'21)
MIT License
227 stars 41 forks source link

Retrained the code and got a bad result. Did i do something wrong? #23

Closed BestSongEver closed 5 years ago

BestSongEver commented 5 years ago

@HuangOwen

I retrained and tested ur code on VCOCO with my GPU, but got a bad result: Average Role [scenario_1] AP = 30.39 while the result in ur paper is: AP=47.8 (RPdCd) Would u mind telling me if i did something wrong? Here are the scripts i used: python tools/Train_TIN_VCOCO.py --num_iteration 300000 --model TIN_VCOCO_test python tools/Test_TIN_VCOCO.py --num_iteration 300000 --model TIN_VCOCO

Thanx again

HuangOwen commented 5 years ago

Hi @BestSongEver

I'm not sure if you change some hyperparameter in ult.py. Make sure the learning rate, dropout rate, cosine learning rate decay is the same with our paper. If you want to use the model for inferencing, you can just download it via google drive link.

BestSongEver commented 5 years ago

@HuangOwen Well, i didn't change any hyperparameter, just modify num_iteration from 20000 to 300000 (following iCAN)...By the way, do 20000 iterations perform good in ur code? 300000 are too much more than 20000.

I will try again and update the issue if i solve the problem.

HuangOwen commented 5 years ago

@BestSongEver I got your point. We actually do not train our model from scratch. Instead, we finetune the model with some weight initialized from iCAN best model, you should download it and make sure you're initializing from the right weight. 20000 iter is enough for fine-tuning but apparently insufficient for state-of-the-art performance if you're training from scratch. : )

BestSongEver commented 5 years ago

@HuangOwen Thanx for ur reply. I tried the initialization with iCAN best model and got a better result. Problem solved! Thanx again.

BestSongEver commented 5 years ago

Hi @HuangOwen Hi again. I am going to cite ur brilliant paper into my work. Before that, I am devoting to reimplement ur result in "RPdCd" Mode with my GPU. But i got some questions: 1.In my understanding, take VCOCO as an example, "60000_TIN_VCOCO_D.pkl" is the output result from binary discriminator in RPt2Cd Mode. It is the "Interactiveness Knowledge" in ur paper. Is that right? So,where did this file come from ur code? 2.with ur "60000_TIN_VCOCO_D.pkl", i can get the result in RPt2Cd Mode, but how can i get ".pkl" result of RPdCd Mode?

  1. Besides, In Transfer Learning Modes, P can learn interactiveness knowledge across datasets, could u please tell me how the algorithm "across datasets"? Which means how to combine the Knowledge from Dataset 1 and Dataset 2 in training ?

Thank u many times!

HuangOwen commented 5 years ago

Hi @BestSongEver sorry for the late reply

  1. This TIN_VCOCO_D is trained with another file with different network architecture, we are still sorting out the code of that. 2/3 Our core insight is that we train P on various datasets because P is transferable and C is not transferable (action definition of VCOCO and HICO is different). To train a P 'across datasets' we mean you just enlarge the training data to more dataset, not restricted to VCOCO, when you're training P.
BestSongEver commented 5 years ago

@HuangOwen Got it. So, until now, I can only get the result in RPt2Cd Mode, not in RPdCd Mode. Maybe u will release ur "TIN_VCOCO_D" training code within the next few days ? I am excited and looking forword to reimplement the result with ur code. Thx again.