Open myshelters opened 1 year ago
I am writing to express my appreciation for your paper titled "Alleviating Spurious Correlations in Knowledge-aware Recommendations through Counterfactual Generator". I found your paper interesting and was encouraged by the presentation of your methodology.
However, upon reviewing your experimental results, I encountered some confusion. Specifically, on the MovieLens-10m dataset, the recall results were lower than the NDCG results, while on other datasets, the recall results were higher than the NDCG results. This phenomenon is similar to one I am currently experiencing, where the recall on one dataset is lower than NDCG, but recall on the other two datasets is higher than NDCG. Thus, I would appreciate it if you could provide insight into this issue.
As a fellow researcher in this field, I am interested in knowing if such a phenomenon is common and why it occurs. Thank you in advance for your valuable input and consideration. I look forward to hearing from you soon.
Thanks for your interest in our work. For this question, I think the two metrics (Recall and NDCG) are not comparable. In my opinion, Recall describes how many ground truth items in the candidate set are hit in the ranking results, and NDCG describes how close the current ranking results is to the ideal ranking results. I think Recall and NDCG belong to different evaluation metrics.
I am writing to express my appreciation for your paper titled "Alleviating Spurious Correlations in Knowledge-aware Recommendations through Counterfactual Generator". I found your paper interesting and was encouraged by the presentation of your methodology.
However, upon reviewing your experimental results, I encountered some confusion. Specifically, on the MovieLens-10m dataset, the recall results were lower than the NDCG results, while on other datasets, the recall results were higher than the NDCG results. This phenomenon is similar to one I am currently experiencing, where the recall on one dataset is lower than NDCG, but recall on the other two datasets is higher than NDCG. Thus, I would appreciate it if you could provide insight into this issue.
As a fellow researcher in this field, I am interested in knowing if such a phenomenon is common and why it occurs. Thank you in advance for your valuable input and consideration. I look forward to hearing from you soon.