Problem of reproduction

YaooXu commented 7 months ago

Thanks for your great work! Howeve, I have some problems in reproducing experiments on the WebQSP dataset. I use the same command in the readme to process the data and train the model, but the test acc is only 0.605, which is much lower than 0.6732 ± 0.0076 reported in the paper. My command is as follows:

python train.py --dataset webqsp --model_name graph_llm

The results are as as follows: The val loss can train loss are as follows:

One possible reason for this could be related to cached graphs. The paper states that the average number of tokens in the WebQSP is 18, whereas in my cached graphs, it is only 8.1. Is there a potential issue with this difference? original graphs

cached graphs

XiaoxinHe commented 7 months ago

Hi,

Thank you for your interest in our work.

To reproduce our results, could you please uncomment the line in the retrieval.py file for the WebQSP dataset at the following URL? https://github.com/XiaoxinHe/G-Retriever/blob/ddff58cab0598967e0c47300a81bdc8e66033405/src/dataset/utils/retrieval.py#L41

The reason is that, in the WebQSP dataset, there are instances where many edges have the same attributes. This situation dilutes the prize assigned to the edges. In such cases, we need to adjust the cost of edges to ensure that at least one edge is chosen. For instance, there could be more than 50 edges with the same attribute "film.film.genre", which ranks first among the top-5 edges. A prize of 5 would be diluted to 0.1 (5/50). If the cost of an edge is 0.5, then no edges will be selected, and only a single node with the highest prize will be returned, which is undesirable.

Therefore, to reproduce our result, please remove the comment on this line. The updated code should yield a mean number of nodes of 15.06. The expected output of the experiment should resemble the following screenshot: Screenshot 2024-03-11 at 19 44 29

I will adjust the code later to make manual modification unnecessary. Thanks for your question.

YaooXu commented 7 months ago

Thanks for your timely reply! The performance improves after using the newest code and remove the comment (from 60.5 to 65.2), but there is still a little gap between my results and that in the paper (65.2 vs 67.3), which might caused by pytorch and cuda version.

XiaoxinHe / G-Retriever

Problem of reproduction #3