facebookresearch / Detic

Code release for "Detecting Twenty-thousand Classes using Image-level Supervision".
Apache License 2.0
1.88k stars 210 forks source link

Possible data leak in lvis_v1_train_cat_info.json #79

Open wusize opened 2 years ago

wusize commented 2 years ago

Hi, Xinyi!

I loaded the file "lvis_v1_train_cat_info.json", it seems to contain image_count for rare classes. It may lead to data leak in the open-vocabulary setting when using the fed loss.

xingyizhou commented 2 years ago

Hi Size,

Thank you for bring up this. I remembered I noted this issue, but thought we could ignore this because the zero-shot embeddings in the FedLoss will never receive a positive loss. They will receive negative losses (and thus have a negative impact on the performance), but these are rare due to the design of Fedloss. I should have tried to remove them in the loss but it was likely not better since I used this in the final version. Please feel free to try the corrected version and post numbers if you find a considerable difference. Thanks!

Best, Xingyi

wusize commented 2 years ago

OK. Thanks for your response! It seems not likely to influence the FedLoss. But I still recommend to push a correct version of lvis_v1_train_cat_info.json that records 0 for novel classes. Otherwise, when setting ignore_zero_class = True for ce and bce loss, there would be some problem.