Thank you very much for the publicly available source code and dataset.
I have two questions that I hope to receive your response to:
In NeXt-clip-bbox-features. zip, the shape of each h5 file is (64, 2, 10, 768). I am curious what (2) and (10) represent? I see that in your model. py, I see that the author uses this in model.py: rFeature=item_dict ['bbox_features'] [:,:, 0,:,:]. So, could you explain the meaning referred to by (64, 0, 10, 768), (64, 1, 10, 768),)?
The NExT-QA dataset seems to have a total of 5,440 videos, but there are 9,454 h5 files in both NeXt-clip-features and NeXt-clip-bbox-features files.、
Looking forward to and thank you very much for your reply!
Thank you very much for the publicly available source code and dataset.
I have two questions that I hope to receive your response to:
In NeXt-clip-bbox-features. zip, the shape of each h5 file is (64, 2, 10, 768). I am curious what (2) and (10) represent? I see that in your model. py, I see that the author uses this in model.py: rFeature=item_dict ['bbox_features'] [:,:, 0,:,:]. So, could you explain the meaning referred to by (64, 0, 10, 768), (64, 1, 10, 768),)?
The NExT-QA dataset seems to have a total of 5,440 videos, but there are 9,454 h5 files in both NeXt-clip-features and NeXt-clip-bbox-features files.、
Looking forward to and thank you very much for your reply!