Closed SuleBai closed 1 year ago
Hi,
Thanks for your questions.
"For the seen class, why you give out-vocab-cls more weight while give in-vocab-cls less weight? And the same question for the unseen class, why you give in-vocab-cls more weight? This appears counterintuitive, since for the unseen class, we should use out-vocab-cls more, and for the seen class, we should use in-vocab-cls more."
I think there may be some misunderstanding. The code you pasted here put weight of (1-alpha)=0.6 to in-vocab and alpha=0.4 for out-vocab for seen classes, (1-beta)=0.2 to in-vocab and beta=0.8 for out-vocab for unseen classes, which is expected intuitively.
Hi @cornettoyu , thank you for your timely response. But I still remain confused.
Before ensemble code, both in_vocab_cls_results
and out_vocab_cls_results
have undergone softmax operations, resulting in values ranging from 0 to 1.
And if both the base and the exponent are within the range of 0 to 1, then as the exponent increases, the value actually becomes smaller. For example, if both in_vocab and out_vocab equal to 0.7, then out_vocab value would be larger than in_vocab value for the seen classes.
>>> in_vocab = 0.7
>>> out_vocab = 0.7
>>> in_vocab ** 0.6
0.8073443754472972
>>> out_vocab ** 0.4
0.8670401643811234
This is quite counterintuitive. Because for the seen class, it actually give out_vocab_cls more weight while give in_vocab_cls less weight. The same also applies to unseen class. Could you explain it?
Thanks again.
Hi,
I'd like to illustrate with a simple example with 2-class:
in-vocab = [0.6, 0.4] out-vocab = [0.4, 0.6] (in-vocab * 0.6) (out-vocab * 0.4) = [0.7360219228178333, 0.5770799623628855] [0.6931448431551464, 0.8151931096059227] = [0.5101698002503163, 0.47043160900986947]
As it shows, the final prediction biases to in-vocab. Feel free to let me know if you have other questions :)
Thanks for your response! I misunderstood this before. It really helped me.
Hi, thanks for your great work.
I am confused about the ensemble code.
alpha
=0.4,beta
=0.8, and bothin_vocab_cls_results
andout_vocab_cls_results
range between 0~1For the seen class, why you give out_vocab_cls more weight while give in_vocab_cls less weight? And the same question for the unseen class, why you give in_vocab_cls more weight? This appears counterintuitive, since for the unseen class, we should use out_vocab_cls more, and for the seen class, we should use in_vocab_cls more.
It really confused me, and I have tried to reset alpha and beta to 0.6 and 0.2(the reverse), but the results is much worse than the original. Could you give me some insight into it?