I understand that VILD network consists of text head and image head.
So, I tried to figure out how the code is implemented and found out
it is in mmdet/models/roi_heads/standard_roi_head_text.py.
In readme, the configs of ViLD and DetPro both point to the same file: detpro_ens_20e.py, I think this does not call both StandardRoIHeadTEXT and StandardRoIHeadTEXTPrompt into the model for training text head and image head.
To do this I think we need to load the detpro_text_promt.py in config folder.
To sum up the question,
Does your training script(vild_detpro.sh) properly construct text head and image head in the model? I think config file detpro_text_promt.py should be loaded for your method.
How should I train the baseline VIPD*? Could you specify the command?!
I am not sure if my understanding is right and I would really appreciate if you could correct me if I am wrong.
Hi.
Thanks for your great work!!
I understand that VILD network consists of text head and image head.
So, I tried to figure out how the code is implemented and found out it is in mmdet/models/roi_heads/standard_roi_head_text.py.
In readme, the configs of ViLD and DetPro both point to the same file: detpro_ens_20e.py, I think this does not call both StandardRoIHeadTEXT and StandardRoIHeadTEXTPrompt into the model for training text head and image head.
To do this I think we need to load the detpro_text_promt.py in config folder.
To sum up the question,
I am not sure if my understanding is right and I would really appreciate if you could correct me if I am wrong.
Thank you for reading and thanks in advance.