Open ez2rok opened 1 year ago
To clarify, I was specifically referring to the entail_weight
or λ
parameter of the MERU
model. However, I see that the authors did experiment with different λ
parameters. To quote from the paper:
Some λ > 0 is necessary to induce partial order structure, however, quantitative performance is less sensitive to the choice of λ ∈ [0.01, 0.3]; Higher values of λ strongly regularize against the contrastive loss and hurt performance.
It seems that the authors did not require the model to learn λ
/ the entail_weight
because λ > 0.3
generally hurt performance and λ <= 0.3
had a qualitative, not quantitate performance and thus would be difficult to learn.
Hello! Great work on this paper!
I was wondering if you at all considered making the entailment loss learnable, similar to the curvature or visual / textual alphas? What went into your decision of manually choosing the entailment loss?
Cheers!