About LVIS dataset preprocessing and more detailed hyper-parameters

DongSky commented 2 years ago

Great work, and sincerely thanks for the open-source code. Based on the code, I could reproduce the performance of P-ASL on COCO dataset. However, when I tried to reimplement the results on LVIS dataset with the hyper-parameters stated in Appendix A, I'm not able to reproduce the similar result (e.g., 78.57 mAP(C) in Table 4). So are you willing to release the corresponding LVIS dataloader as well as more detailed hyper-parameters?

ebenbaruch commented 2 years ago

Hey, Thanks for the kind words.

Can you please detail the hyper-parameters you used for training LVIS? (learning-rate, batch-size, epochs, focusing parameters?) Did you use an image size of 448, and the architecture is MTResNet?

Also, how did you compute the mAP score? note that the mAP should be computed using the annotated labels only.

DongSky commented 2 years ago

Hi, the reimplementation details of P-ASL for LVIS are shown as follows (I also uploaded my reproduce code at https://github.com/DongSky/P_ASL_LVIS_reproduce, maybe it is helpful for debugging):

backbone: TResNet-M with corresponding ImageNet-22K pretrained parameters
learning rate: 6e-4 (from Appendix A)
batch size: 64 (from the original code)
epochs: 30 (from the original code)
weight decay: 3e-4 (from Appendix A)
gamma: 1 for unknown, 0 for pos and neg (from Appendix A)
loss type: selective
likelihood top-k: 5 (from original code, and Appendix A does not include the detailed number)
prior threshold: 0.5
EMA: used, ema decay is 0.9997 (from the original code)

And during testing, we only select the positive label (annotated by 1) and negative label (annotated by 0) to calculate mAP (by selecting annotations larger than -1).

To speedup training, we simply adding the torch.distributed code into the training code to support distributed learning, and finally, we obtained 64.23 mAP, the log is shown as tresnetm_lvis_two_gpus.log

Note that I also tried single GPU training, and obtained 65.98 mAP, which is shown as tresnetm_lvis_single_gpu.log

ebenbaruch commented 2 years ago

The gap is too high. I would recommend starting with simpler mode like 'negative', and make sure the metric is used correctly. For simplicity I will add the partial-mAP metric to the code (in helper_functions.py).
We may publish more data and utils (like annotation file for LVIS) soon, but it will take some time. Stay tuned.

rookietyper commented 1 year ago

Can you please upload the code for lvis dataset, and your email address emanuel.benbaruch@alibaba-inc.com seems to be wrong

Alibaba-MIIL / PartialLabelingCSL

About LVIS dataset preprocessing and more detailed hyper-parameters #3