Closed ZiweiHou closed 2 years ago
We need to use negative samples to compute cross-entropy loss, which is a traditional training way in FM models. See our loss function:
def fit_nll_neg(self, input_batch, epsilon=1e-9):
preds = torch.sigmoid(self.forward(input_batch))
cost = - torch.log(preds[:, 0] + epsilon).sum() - torch.log(1 - preds[:, 1:] + epsilon).sum()
return cost / preds.shape[0]
In my data file, one positive sample is the first line and its negative samples follows. And the following is the second positive one. Repeat... So the raw data is shaped like (n_samples * (1 + neg_num) $\times$ n_features). We need to reshape it as (n_samples $\times$ (1 + neg_num) $\times$ n_features), and the dataloader read the rows.
Hi @MogicianXD,
Thanks for your reply. Is there any particular reason for using such a data format? Can you release an example of your data?
No particular reason. The data format is determined by your data preprocessing. I've push the frappe dataset now. The codes are written two years ago, and I've not tested if it works well.
Thank you soooo much!
Hi,
in line 10 of dataset.py file, it reshapes the feature to (1+neg_num, feature.shape[-1]). What is neg_num? and why there is a need to reshape feature?