SHTUPLUS / PySGG

The toolkit for scene graph generation
Other
71 stars 15 forks source link

bug: instance resampling happened in evaluation #9

Closed rafa-cxg closed 2 years ago

rafa-cxg commented 2 years ago

both image-level and instance-level resampling should't apply to evaluation process. I noticed that image-level resampling is disabled by if cfg.MODEL.ROI_RELATION_HEAD.DATA_RESAMPLING and self.split == 'train' in _pysgg/data/datasets/visualgenome.py however, instance level resampling is not filter by it.

you should change it to: (code in get_groundtruth()) image

rafa-cxg commented 2 years ago

@Scarecrow0

Scarecrow0 commented 2 years ago

In validation or test mode, the member: repeat_dict of VGDataset is left as an initial value of None already (in line 124 of init), so we do not need to add a check in function get_groundtruth. However, you can add this check for safety. Anyway, the resampling is only applied in training phrases in our implementation.

rafa-cxg commented 2 years ago

thanks for your replying. However, during training process, if you choose using one of the sampling method, it would generate repeat_dict.pkl in path. So, it would caused sampling in val process.(I have checked this, instance sampling is applied in val_dataset.)

Scarecrow0 commented 2 years ago

The repeat_dict.pkl is generated only for the training set. It won't load and can't be used in validation or test. the self.repeat_dict remains none during the initialization of VGDataset in test or validation mode. Please show the details of your checking process.

rafa-cxg commented 2 years ago

I think your explanation is right. Maybe it is caused by randomly choice annotation of vg dataset? you can see, gt_rels has a lot of same same relation with same sub obj id image

I can show you returned gt_triblet_box, coordinate is the same image

rafa-cxg commented 2 years ago

by the way, i would like to ask why calculation of mean recall and others are performed on every single image? Shouldn't calculating topk is about on whole val dataset? image One of possable situation: What if relation predictions in one image is less than k? That's not good

Scarecrow0 commented 2 years ago
rafa-cxg commented 2 years ago

HELLO, how you seen the first and second picture i posted? It seems like there are some repeated triblet in val...

Scarecrow0 commented 2 years ago
  • The dataset itself, the Visual Genome Stanford spilt does have this issue of repeated relationship annotation. You can directly load the annotation and check this repeat. Most SGG works keep this repeat in val/test time. For fair comparison with previous works, we also keep the repeat annotation in our implementation.
  • In evaluation metrics of visual relationship detection, the top-k recall means we only take at most 100 relationship predictions for matching with the GT relationship for each image. To this end, the number of predictions that can be generated by a model has no relevance to evaluation and is only a hyper-parameter for model design. For the evaluation protocol, please refer to the previous works that proposed the Visual Genome Stanford split. They have detailed descriptions and disscusions, and we identically follow the previous evalutation pipeline in this codebase.

: ) Please read my reply carefully. I re-wrote it for more clarity.

rafa-cxg commented 2 years ago

Hi, i really appreciated your explanation! But i haven't find the code to "keep this repeat in val/test time" . I don't know where is it? Thanks!

Scarecrow0 commented 2 years ago

The repeat is already in the annotation of the dataset. We leave it untouched in val/test time. The duplicated relation filtering only works in training time, as shown in the function get_groundtruth:

https://github.com/SHTUPLUS/PySGG/blob/main/pysgg/data/datasets/visual_genome.py#L299-315

rafa-cxg commented 2 years ago

get it! Thanks your dedicated works again!