Open DongSky opened 2 years ago
Hey, Thanks for the kind words.
Can you please detail the hyper-parameters you used for training LVIS? (learning-rate, batch-size, epochs, focusing parameters?) Did you use an image size of 448, and the architecture is MTResNet?
Also, how did you compute the mAP score? note that the mAP should be computed using the annotated labels only.
Hi, the reimplementation details of P-ASL for LVIS are shown as follows (I also uploaded my reproduce code at https://github.com/DongSky/P_ASL_LVIS_reproduce, maybe it is helpful for debugging):
And during testing, we only select the positive label (annotated by 1) and negative label (annotated by 0) to calculate mAP (by selecting annotations larger than -1).
To speedup training, we simply adding the torch.distributed code into the training code to support distributed learning, and finally, we obtained 64.23 mAP, the log is shown as tresnetm_lvis_two_gpus.log
Note that I also tried single GPU training, and obtained 65.98 mAP, which is shown as tresnetm_lvis_single_gpu.log
The gap is too high. I would recommend starting with simpler mode like 'negative', and make sure the metric is used correctly. For simplicity I will add the partial-mAP metric to the code (in helper_functions.py).
We may publish more data and utils (like annotation file for LVIS) soon, but it will take some time. Stay tuned.
Can you please upload the code for lvis dataset, and your email address emanuel.benbaruch@alibaba-inc.com seems to be wrong
Great work, and sincerely thanks for the open-source code. Based on the code, I could reproduce the performance of P-ASL on COCO dataset. However, when I tried to reimplement the results on LVIS dataset with the hyper-parameters stated in Appendix A, I'm not able to reproduce the similar result (e.g., 78.57 mAP(C) in Table 4). So are you willing to release the corresponding LVIS dataloader as well as more detailed hyper-parameters?