Closed ZHUXUHAN closed 10 months ago
Direct mapping of predicates and retraining the baseline (motifs) model in this way require learning 588 predicate features, which will restrain base classes. In order to maintain the performance of the base classes, during the training process, we mapped the VG+cacao dataset to 50 classes (1-50, and 15 of 50 are novel classes) of the target and completed the training of the baseline model based on this (training with base classes and extra unseen classes, totally is 50, not 588). (R @ 50: 0.1748) Besides, you need to pay attention to whether you are training incorrectly, if directly train the motifs model instead of contrastive learning, there is no gradient decline on the novel classes, so the recall is 0. (R @ 50: 0.1122)
Thank you for your patient answer. Another question is, if some unseen classes are introduced in the training, can it still be called unseen or novel classes?
Our main purpose is to verify the additional effect of the extra data from CaCao. We have only narrowed down the scope of the CaCao pseudo-predicates, but it is invisible to the ground truth of the novel classes (This is our uniform setting).
My understanding is that the labels of these training novel classes samples come from pseudo labels not from the ground truth, so they can also be called unseen, is it right?
Yes (can also be called unseen), but sampling from pseudo labels may be knowable about the target categories, so it's probably easier than a strictly unseen setting.
Yes, this is a weak unseen, compared to the traditional open vocabulary setting or zero-shot settings. I am running your data set recently, and hope that the results of this latest experiment are in line with expectations. Thank you for your answer.
i just meet another problem, the test dataset is original vg dataset (with 50 predicates) or your provided VG+Cacao dataset (with 50 predicates as vg).
It should be the latter. Have you solved the problem of performance reproduction?
Ok. The training will be completed later today, but it is not finished yet
the base recall is 11.3 novel is 6.9 all is 9.9, are these results reasonable?
do you use the motifs's frequence bias trick?
and i find the sample number of mapping test images is 2262? is it right?
These results might be reasonable (reflect the improved performance of CaCao). However, the final performance is affected by the quality of the mapping predicates. Besides, we consider the influence of frequency (but did not use any information from the unseen ground truth). Therefore, higher performance can be achieved. Finally, we didn't make additional filtering on the test images, only dividing the predicate category for the test (looks like more than 2,262).
These results might be reasonable (reflect the improved performance of CaCao). However, the final performance is affected by the quality of the mapping predicates. Besides, we consider the influence of frequency (but did not use any information from the unseen ground truth). Therefore, higher performance can be achieved. Finally, we didn't make additional filtering on the test images, only dividing the predicate category for the test (looks like more than 2,262).
another question is do you only considering the labeled pair, and don't considering the other pairs (most is N*N for N objects)?
consider total N*N object pairs, but the total objects are known in PREDCLS.
consider total N*N object pairs, but the total objects are known in PREDCLS.
my reproduced motifs's perfomance is Top 100: Base: 0.1267 Novel: 0.0813 All: 0.1138. which is worser than your provided baseline. i think maybe my test images are not right. the filterd images are 2,262, which is small amount.
You might consider the influence of frequency as well as your test images, maybe better.
You might consider the influence of frequency, maybe better.
yes i just use this trick.
novel_classes = [311, 400, 348, 3, 128, 542, 321, 299, 149, 555, 260, 9, 104, 13, 331] base_classes = [2, 7, 37, 39, 42, 45, 50, 56, 70, 72, 134, 136, 153, 158, 164, 171, 173, 286, 301, 314, 318, 320, 328, 343, 360, 393, 448, 478, 513, 535, 559, 570, 571, 572, 581] all_classes_map = {v: i + 1 for i, v in enumerate(all_classes)} all_classes = novel_classes + base_classes for i, relations in enumerate(original_relationships): new_relation = [] for rels in relations: if rels[2] in all_classes: new_relation.append(np.array([[rels[0], rels[1], all_classes_map[rels[2]]]])) this is my filtering code. if the annotations do not contain the wanted classes, this image will be filtered.
In general, test images are not extended and should not contain additional categories of predicates.
m, the test dataset is original vg dataset (with 50 predicates) or your provided VG+Cacao dataset (with 50 predicates as vg).
but the VG+Cacao test dataset contains other categories of predicates (not only 50 categories), maybe you mean the original vg test dataset?
I remember we filtered unneeded triplets in predicate-level instead of in image-level during inference, hoping to help you.
I remember we filtered unneeded triplets in predicate-level instead of in image-level during inference, hoping to help you.
i think we do the same, if an image's annotaions have more than one wanted predicates, it will be used, if it has no one wanted predicates, it will not be used.
I remember we filtered unneeded triplets in predicate-level instead of in image-level during inference, hoping to help you.
i think we do the same, if an image's annotaions have more than one wanted predicates, it will be used, if it has no one wanted predicates, it will not be used.
the test dataset is "./open-world/VG-SGG-zs-random-EXPANDED-with-attri.h5"?
but the VG+Cacao test dataset contains other categories of predicates (not only 50 categories), maybe you mean the original vg test dataset?
Sorry, I seem to have evaluated the VG+Cacao dataset, but the valid image (has predicates in target predicate classes) is more than 2262. Then it looks like more than a small amount. We found that the corresponding dictionary in 'zs-random' seems to be contaminated, I updated the idx of base classes and novel classes as follows:
base_classes = [2, 7, 16, 18, 21, 23, 26, 32, 45, 47, 109, 111, 128, 132, 137, 143, 145, 182, 197, 210, 213, 215, 223, 237, 254, 284, 335, 362, 396, 415, 439, 450, 451, 452, 454] novel_classes = [207, 291, 242, 3, 103, 422, 216, 195, 124, 435, 156, 9, 79, 13, 226]
I will check and update the 'idx_to_predicate' and 'predicate_to_idx' information in the future.
base_classes = [2, 7, 16, 18, 21, 23, 26, 32, 45, 47, 109, 111, 128, 132, 137, 143, 145, 182, 197, 210, 213, 215, 223, 237, 254, 284, 335, 362, 396, 415, 439, 450, 451, 452, 454] novel_classes = [207, 291, 242, 3, 103, 422, 216, 195, 124, 435, 156, 9, 79, 13, 226]
ok, i will retrain this baseline model with this map.
hi, my motif baseline' recall of base is about 20, novel is about 10, it seems the novel is also a little lower, do you upload the epic model? it seems it is much better than the baseline. By the way, according to what you said before, you used N*N pairs, but according to your paper description, epic seems to perform prompt learning at the triplet-level. I think the amount of calculation is huge on some samples, because for some samples, the number of pairs is huge, how do you solve this problem.
It seems reasonable that a little lower could be caused by the quality of CACAO triplet mapping. According to the next works and other cooperators, we will sort out this part of the code later. Thank you for your understanding.
for some samples, the number of pairs is huge, how do you solve this problem.
For training, we use labeled pairs; while in inference, we perform prompt learning on pseudo-label pairs by removing some candidate samples by confidence. (The overhead is acceptable). By the way, we consider triplet-level only on image-aware prompts and adopt predicate-level on text-aware prompts, thus the overhead is also acceptable.
i just map the openword classes to base and novel classes as: novel_classes = [311, 400, 348, 3, 128, 542, 321, 299, 149, 555, 260, 9, 104, 13, 331] base_classes = [2, 7, 37, 39, 42, 45, 50, 56, 70, 72, 134, 136, 153, 158, 164, 171, 173, 286, 301, 314, 318, 320, 328, 343, 360, 393, 448, 478, 513, 535, 559, 570, 571, 572, 581] and train the baseline model(motifs) using the VG+cacao dataset. but the baseline's base classses' Recall is about 10 lower than your reported results and the novel classes' Recall is 0, can you provide some suggestions for the performance reproduction?