Yuqifan1117 / CaCao

This is the official repository for the paper "Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in an Open World" (Accepted by ICCV 2023)
40 stars 5 forks source link

Are the trained-models weight and some files missing? #8

Closed Yassin-fan closed 10 months ago

Yassin-fan commented 10 months ago

Thank you for the interesting work that advances open vocabulary and unbiased prediction of scene graphs.

I am interested in your work and tried running the code, and after a quick browse I noticed that there seemed to be some files missing.

For example, no corresponding files were found for 'bert-base-uncased/prompt.txt' and 'pre_trained_visually_prompted_model/gqa_fine/gqa_model_VPT_LPT_ASCL_threshold07.pkl', which are imported in lines 41 and 46 of fine_grained_mapping.py.

I would like to know if you can provide the trained-model weights and files, so that I can run the test set directly without local training, so as to get results and indicators similar to those reported in the paper?

If it is possible that I was negligent and did not notice the location of the corresponding file, please forgive me and inform me of the address corresponding to the files. Thank you!

Yassin-fan commented 10 months ago

In addition, I still have some doubts about the code.

In line 31 of fine_grained_mapping.py, the predicate list of the vg data is imported as target_words, but it is not used subsequently.

In line 64, the predicate label of vg_1800 is imported into vg_1800_predicate_label again, but it is still not used.

At line 71 of code: mapping_dict[raw_word] = model_own.mapping_target(raw_word, raw_predicates, prep_words, device)

I think that the parameter target_words of this function is the original 50+ categories, but the input is raw_predicates, which seems to be newly generated fine-grained predicates? Is there an error here?

Yuqifan1117 commented 10 months ago

Thank you very much for your attention! First of all, I'm sorry for some confusing naming in the code (contamination due to some extra experiments), we've completed some updates. In addition, due to the large trained-model files, we first upload the extended dataset to help the community. Our experiments are mainly implemented based on FGPL that needed to run the trained model on this architecture if necessary. For the code doubts, different predicate labels are used to construct different ablation studies. By the way, the target_words of this function should be the original 50 categories for VG :).

Yassin-fan commented 10 months ago

Thank you for your answers and corrections.

I will try to understand your ideas, learn and run your code, and may have some questions in the future. Thank you in advance for your efforts and answers!