I realized the fastText code at the underlying , and adopted the gradient ascending method when calculating gradient, which could be completed by training small sample data. However, when training large sample data, update the embedding matrix and add lr*grad to each word vector. After several epoches, the embedding matrix will explode directly (nan). We want to know how the underlying embedding matrix is updated.
I realized the fastText code at the underlying , and adopted the gradient ascending method when calculating gradient, which could be completed by training small sample data. However, when training large sample data, update the embedding matrix and add lr*grad to each word vector. After several epoches, the embedding matrix will explode directly (nan). We want to know how the underlying embedding matrix is updated.