Questions about Table1 result

The problem is, in order to remove images that contain unseen objects, we must have annotations for unseen objects! In real-world settings, we don't know what other objects exist in each training image. Hence, we cannot discard the images that contain other objects. This is not only a more realistic setting, but it's actually more difficult too, because the model may learn that all other objects that appear are background, and it may be harder to generalize to those classes. That is exactly why we had to multiply the loss of the background class by a small weight (alpha) and tune it. Therefore, we believe our setting is the correct way to conduct zero-shot learning experiments.

Nevertheless, it is true that some other works remove images that contain unseen classes from their training data, and it is true that those models are trained with a smaller training data. Hence, it is not completely fair to compare with their numbers. We tried our best to make fair comparisons at least in our ablations, but it was not feasible to re-implement every baseline with exact same settings as ours. However, the substantial performance improvement of our method is very unlikely to be due to the larger training data.

alirezazareian / ovr-cnn

Questions about Table1 result #14

❓ Questions and Help