switched bias initialization to initialize to zeros per standard Xavi…

facebookresearch / dlrm

An implementation of a deep learning recommendation model (DLRM)

MIT License

3.71k stars 825 forks source link

switched bias initialization to initialize to zeros per standard Xavi… #358

Open eknag opened 11 months ago

eknag commented 11 months ago

…er initialization. See pdf page 251 in https://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf. As set before, the bias may be initialized to a large negative value, leading to a negative input to Relu. This prevents any training, as all derivatives are zero afterward.

See downstream issue here: https://github.com/pytorch/benchmark/pull/1927