Could you explain the reason why you use the pe_scaler and ple_scaler in the forward pass of the Encoder-class in kingcrab.py?
In particular, why do you choose the form
pe_scaler = 2**(1-self.pos_scaler)**2
and
ple_scaler = 2**(1-self.pos_scaler_log)**2?
I don't really understand why one needs these two scalers (and also self.emb_scaler) in the first place and why you chose the above exponential forms for them.
Could you explain the reason why you use the
pe_scaler
andple_scaler
in the forward pass of theEncoder
-class inkingcrab.py
? In particular, why do you choose the formpe_scaler = 2**(1-self.pos_scaler)**2
andple_scaler = 2**(1-self.pos_scaler_log)**2
? I don't really understand why one needs these two scalers (and alsoself.emb_scaler
) in the first place and why you chose the above exponential forms for them.