First I wanted to thank you for your excellent work on your recent papers.
I'm trying to reproduce the results using your version of OpenImages as linked here and I have 2 questions:
The way I'm loading the dataset is by loading the CSV file you provided and storing the indices for both positive and negative annotations for each image. When a sample is retrieved, I then fill an array with 0's (if using negative) or -1 (if using ignore) and replace the values at the indices for that sample with the corresponding values. In other words, if for a specific image I have indices [12, 35, 100] with their corresponding values as [1.0, 0.0, 1.0], then I will replace those indices in the label array with the corresponding values. So that array will end up filled with 0's (assuming negative) with the entries as 12, 35, and 100 set to 1, 0, 1 respectively. Based on the code from this repo and the one for ML_Decode I'm almost sure this is correct, but I wanted to confirm that this is the case?
When running the validation code, I'm using the exact same code you shared, but my mAP is coming out very low (~20%). If I remove any unknown annotations from the output and label tensors (by setting them to 0) I can get the mAP to rise to 44%, but that is still significantly lower than the 86% you're reporting in the paper. Am I missing something with the way I'm calculating mAP?
For reference, here's how I'm training:
Training set is resized to 224 square, augmented with CutoutPIL(0.5) (taken from your code), RandAgument() and normalising with ToTensor().
Validation set is resized to 224 square and normalised with ToTensor.
True weight decay (as implemented in add_weight_decay) set to 3e-4
Adam optimizer with lr=2e-4, wd=0
OneCycleLR scheduler with max_lr=2e-4, pct_start=0.2
ASL loss with gamma_neg=7, gamma_pos=0, clip=0.05, disable_torch_grad_focal_loss=True
Training for 30 epochs.
Network is ResNet 50, pretrained on ImageNet using the weights from torchvision (I'm using it for simplicity, I'll upgrade to TResNet once this works).
Hello,
First I wanted to thank you for your excellent work on your recent papers. I'm trying to reproduce the results using your version of OpenImages as linked here and I have 2 questions:
[12, 35, 100]
with their corresponding values as[1.0, 0.0, 1.0]
, then I will replace those indices in the label array with the corresponding values. So that array will end up filled with 0's (assuming negative) with the entries as 12, 35, and 100 set to 1, 0, 1 respectively. Based on the code from this repo and the one for ML_Decode I'm almost sure this is correct, but I wanted to confirm that this is the case?For reference, here's how I'm training:
CutoutPIL(0.5)
(taken from your code),RandAgument()
and normalising withToTensor()
.ToTensor
.add_weight_decay
) set to 3e-4OneCycleLR
scheduler withmax_lr=2e-4
,pct_start=0.2
ASL
loss withgamma_neg=7
,gamma_pos=0
,clip=0.05
,disable_torch_grad_focal_loss=True