Closed soumilinandi closed 3 years ago
@soumilinandi: Yes you could certainly do that, I haven't tried myself though. I'm not sure if they do anything special with the initialisation of the additional parameters (had a quick glance at the paper and didn't see it mentioned). I would suggest trying something like Kaiming uniform (default in Pytorch AFAIK, I'm not sure what TF does now) and contacting the authors if you can't get it to work :)
Can this code be extended to language model as described in the paper? How do we initialize the value of U and Weight Matrix W (lm) in that case?