Reason to use pe_scaler and ple_scaler

Could you explain the reason why you use the pe_scaler and ple_scaler in the forward pass of the Encoder-class in kingcrab.py? In particular, why do you choose the form pe_scaler = 2**(1-self.pos_scaler)**2 and ple_scaler = 2**(1-self.pos_scaler_log)**2? I don't really understand why one needs these two scalers (and also self.emb_scaler) in the first place and why you chose the above exponential forms for them.

anthony-wang / CrabNet

Reason to use pe_scaler and ple_scaler #37