Making entailment loss learnable?

To clarify, I was specifically referring to the entail_weight or λ parameter of the MERU model. However, I see that the authors did experiment with different λ parameters. To quote from the paper:

Some λ > 0 is necessary to induce partial order structure, however, quantitative performance is less sensitive to the choice of λ ∈ [0.01, 0.3]; Higher values of λ strongly regularize against the contrastive loss and hurt performance.

It seems that the authors did not require the model to learn λ / the entail_weight because λ > 0.3 generally hurt performance and λ <= 0.3 had a qualitative, not quantitate performance and thus would be difficult to learn.

facebookresearch / meru

Making entailment loss learnable? #7