Open HowardZJU opened 1 month ago
For example, to address the problems issued, whether it is feasible to change the _set_label_by_threshold(self) function, by setting negative labels to -1?
def _set_label_by_threshold(self):
"""Generate 0/1 labels according to value of features.
According to ``config['threshold']``, those rows with value lower than threshold will
be given negative label, while the other will be given positive label.
See :doc:`../user_guide/data/data_args` for detail arg setting.
Note:
Key of ``config['threshold']`` if a field name.
This field will be dropped after label generation.
"""
threshold = self.config["threshold"]
if threshold is None:
return
self.logger.debug(f"Set label by {threshold}.")
if len(threshold) != 1:
raise ValueError("Threshold length should be 1.")
self.set_field_property(
self.label_field, FeatureType.FLOAT, FeatureSource.INTERACTION, 1
)
for field, value in threshold.items():
if field in self.inter_feat:
self.inter_feat[self.label_field] = (
self.inter_feat[field] >= value
).astype(int)
else:
raise ValueError(f"Field [{field}] not in inter_feat.")
if field != self.label_field:
self._del_col(self.inter_feat, field)
描述这个 bug 以ML-1M数据集为例,评分【1-5】。
生成的稀疏inter矩阵只存储了评分大于threshold的user-item。评分小于threshold的user-item,和未观测的user-item一同设为0。
这种做法没有有效利用显反馈负样本。把显反馈负样本和未观测样本都视作负样本。
问题和诉求
如何复现 复现这个 bug 的步骤: 在quick start中,于下列代码打断点观察即可。
train_data, valid_data, test_data = data_preparation(config, dataset)
实验环境: