Open chchshshhh opened 2 years ago
Thanks for the question. In general, there is no reason not to add more cross-scale terms (e.g A1-A2 A2-A3 ...). Adding terms would slightly increase training time but it is possible it could improve performance. In practice, we opted for cross-scale terms with features that have a slightly higher scale difference than A1-A2 (x2), namely A1-A3 (x4) and A1-A4 (x8). The motivation here was to better capture local vs global properties of the classes in the contrastive loss. We have not systematically tested all other possible cross-scale term options as we found that A1-A3 and A1-A4 worked well for many models/datasets.
Thanks for the question. In general, there is no reason not to add more cross-scale terms (e.g A1-A2 A2-A3 ...). Adding terms would slightly increase training time but it is possible it could improve performance. In practice, we opted for cross-scale terms with features that have a slightly higher scale difference than A1-A2 (x2), namely A1-A3 (x4) and A1-A4 (x8). The motivation here was to better capture local vs global properties of the classes in the contrastive loss. We have not systematically tested all other possible cross-scale term options as we found that A1-A3 and A1-A4 worked well for many models/datasets.
Thank you very much for your answer. I have another question about the weight of loss, why does the setting of [1.0 0.7 0.4 0.1] work well? Is there any basis for that? or do I understand that in cross-scale learning, shallow contrast learning is more important and can lead to deeper semantic feature learning?
Very interesting work, but why only the loss of two scales is crossed, and A1A2 is not added for cross-scale loss?