HKUDS / XRec

[EMNLP'2024] "XRec: Large Language Models for Explainable Recommendation"
http://arxiv.org/abs/2406.02377
Apache License 2.0
85 stars 6 forks source link

Evaluation results #5

Open wangyu0627 opened 2 hours ago

wangyu0627 commented 2 hours ago

I restart training on the yelp dataset exactly as instructed. Why is it that my local deployment of bertscore gets results so different from yours?

In evaluation/metric.py def BERT_score(predictions, references): bertscore = evaluate.load("bertscore.py") results = bertscore.compute( predictions=predictions, references=references, model_type="roberta-large", num_layers=model2layers["roberta-large"], )

image

Martin-qyma commented 1 hour ago

Thank you for your interest in XRec! You can set rescale_with_baseline=True in bertscore.compute. For more details, please refer to the implementation in evaluation/metric.py.

This operation scales the BERTScore between 0 and 1. It does not affect the ranking ability or the correlation with human judgments, but is intended to improve the readability of the score. We hope you find this helpful.

wangyu0627 commented 1 hour ago

Thanks, your response was very helpful!