How to get bert_ft.h5 for myself dataset

doc-doc / NExT-QA

NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR'21)

MIT License

124 stars 11 forks source link

How to get bert_ft.h5 for myself dataset #3

Open wangbq18 opened 3 years ago

wangbq18 commented 3 years ago

Hello, I want to ask you another question, how to get bert_ft.h5 for myself dataset? how to encode qusetion and answer with bert? Are they separate or together ? thanks!

doc-doc commented 3 years ago

Hi, please refer to the link given in readme & our paper. Answers are appended behind the corresponding question for multi-choice QA.

wangbq18 commented 3 years ago

Hi, please refer to the link given in readme & our paper. Answers are appended behind the corresponding question for multi-choice QA.

OK, I see. a another question, how to get motion feature with a shape as (16, 2048). With code provided by [HCRN], the motion feature shape is (8, 2048) with 8 clips, Dose that mean I should set clips=16? And your paper said the best performance is from using ResNet as an appearance feature along with I3D ResNeXt as a motion feature (Res+I3D), How to get I3D feature. Can you share the code?

doc-doc commented 3 years ago

Hi, we use I3D with ResNeXt as backbone to capture motion info. The code can also be found in HCRN. The number of sampled clips depends on your dataset, usually ranges from 8~32..

wangbq18 commented 3 years ago

Hi, we use I3D with ResNeXt as backbone to capture motion info. The code can also be found in HCRN. The number of sampled clips depends on your dataset, usually ranges from 8~32..

Thanks a lot, I have solved the problem above. There is no HCRN model implementation base on bert, I try to implementation, but When I repalce glove with bert, It doesn't convergence. Can you share the code?

doc-doc commented 3 years ago

You need to finetune BERT on your own dataset, and then extract token representations for sentences. Afterwards, you can use the extracted BERT features to replace the GloVe embedding layer in HCRN. You can learn from NExT-QA (this repo.) on how to replace GloVe with BERT features. We are not going to release this part of work so far..

doc-doc commented 2 years ago

Hi, we use I3D with ResNeXt as backbone to capture motion info. The code can also be found in HCRN. The number of sampled clips depends on your dataset, usually ranges from 8~32..

Thanks a lot, I have solved the problem above. There is no HCRN model implementation base on bert, I try to implementation, but When I repalce glove with bert, It doesn't convergence. Can you share the code?

Hi, HCRN-BERT implementation is available here.

doc-doc commented 2 years ago

Hi, we have released the edited code for fintuning BERT on NExT-QA here. You can also fine-tune other datasets by using the code.

PolarisHsu commented 1 year ago

Hi, we have released the edited code for fintuning BERT on NExT-QA here. You can also fine-tune other datasets by using the code.

Hi, this link has expired, can you provide it again?

doc-doc commented 1 year ago

Yes. Please download it via this link.