reshape data in dataset.py

ZiweiHou commented 2 years ago

Hi,

in line 10 of dataset.py file, it reshapes the feature to (1+neg_num, feature.shape[-1]). What is neg_num? and why there is a need to reshape feature?

MogicianXD commented 2 years ago

We need to use negative samples to compute cross-entropy loss, which is a traditional training way in FM models. See our loss function:

    def fit_nll_neg(self, input_batch, epsilon=1e-9):
        preds = torch.sigmoid(self.forward(input_batch))
        cost = - torch.log(preds[:, 0] + epsilon).sum() - torch.log(1 - preds[:, 1:] + epsilon).sum()
        return cost / preds.shape[0]

In my data file, one positive sample is the first line and its negative samples follows. And the following is the second positive one. Repeat... So the raw data is shaped like (n_samples * (1 + neg_num) $\times$ n_features). We need to reshape it as (n_samples $\times$ (1 + neg_num) $\times$ n_features), and the dataloader read the rows.

ZiweiHou commented 2 years ago

Hi @MogicianXD,

Thanks for your reply. Is there any particular reason for using such a data format? Can you release an example of your data?

MogicianXD commented 2 years ago

No particular reason. The data format is determined by your data preprocessing. I've push the frappe dataset now. The codes are written two years ago, and I've not tested if it works well.

ZiweiHou commented 2 years ago

Thank you soooo much!

MogicianXD / FMRT

reshape data in dataset.py #3