Performance of LLM baselines too low

HKUDS / GraphGPT

[SIGIR'2024] "GraphGPT: Graph Instruction Tuning for Large Language Models"

https://arxiv.org/abs/2310.13023

Apache License 2.0

493 stars 36 forks source link

Performance of LLM baselines too low #27

Closed W-rudder closed 7 months ago

W-rudder commented 7 months ago

Excuse me, I would like to know what are the baseline model experimental settings for LLM? Will the task template also include the title and abstract of the paper? Here I use vicuna-7B-v1.5 for pubmed data prediction, and the ACC obtained is 0.86, which is much higher than the result in the paper.

tjb-tech commented 7 months ago

Thank you very much for your interests in our work! We conduct all the experiments in the same evaluation scripts. Have you tried your evaluation scripts to evalute the model and its variants on the other datasets? We recommend you to try more experiments on the other two datasets. In our experiments, the results on pubmed are more unstable that those of the other two datasets. And we change different seed and different temperature to conduct a fair experiments. One of our experimental results on pubmed is available at https://www.dropbox.com/scl/fo/n5dk2kntjsf4tzt18cnot/h?rlkey=iwtq3t7y3arhfaxrh6xkj4cgv&dl=0. You can feel free to check it. Hope that my answer is helpful for you.

W-rudder commented 7 months ago

Thank you very much for your interests in our work! We conduct all the experiments in the same evaluation scripts. Have you tried your evaluation scripts to evalute the model and its variants on the other datasets? We recommend you to try more experiments on the other two datasets. In our experiments, the results on pubmed are more unstable that those of the other two datasets. And we change different seed and different temperature to conduct a fair experiments. One of our experimental results on pubmed is available at https://www.dropbox.com/scl/fo/n5dk2kntjsf4tzt18cnot/h?rlkey=iwtq3t7y3arhfaxrh6xkj4cgv&dl=0. You can feel free to check it. Hope that my answer is helpful for you.

Thanks for your reply!

linwhitehat commented 6 months ago

Excuse me, I would like to know what are the baseline model experimental settings for LLM? Will the task template also include the title and abstract of the paper? Here I use vicuna-7B-v1.5 for pubmed data prediction, and the ACC obtained is 0.86, which is much higher than the result in the paper.

Hi, I'm not sure how to go about determining accuracy after completing an evaluation, and I'd like to ask what part of the data you get the labeling information from to make that determination? I hope you can help, thank you very much!