Closed lovodkin93 closed 3 years ago
Hi, you just set the mean to the corresponding ".5" position.
For example, if pos(pre)=2
and pos(train)=3
, then middle=2.5
.
You can see what this means in terms of distribution in Figure 3 in the paper (e.g., if there are only two subwords, both get the same weight).
great, thanks!
Hello, In your paper, you have stated that p_t is the middle position of the t-th token's dependency parent. My question is, what if the dependency parent is segmented into an even number of subwords? For example, if the word "pretrain" is divided into "pre" and "train", should the middle be "pre" or "train"? Thanks!