alibaba / GraphTranslator

GraphTranslator:Aligning Graph Model to Large Language Model for Open-ended Tasks
BSD 3-Clause "New" or "Revised" License
68 stars 12 forks source link

About Quantitative Analysis and the Taobao Dataset. #5

Closed LITONG99 closed 4 months ago

LITONG99 commented 4 months ago

Dear authors and contributors, thank you very much for such remarkable work!

I noticed that the quantitative analysis of the GraphTranslator output was only conducted with the Taobao dataset. Could you make the dataset public? Because the Taobao dataset seems to be more complex and informative, as a demo, even a partial of the dataset will be very helpful in grasping an understanding of the model's behavior.

Thank you again for open-sourcing your valuable work. Looking forward to your reply!

LITONG99 commented 4 months ago

I would like to add that I conducted the quantitative analysis myself with the published model checkpoint_0.pth. For the 20 samples from data/arxiv/summary_embeddings.csv, none of the outputs corresponded to the original samples.

Taking paper 5151 as an example, which is the 1st row in summary_embeddings.csv, the output is "The question is about a research study on \"The Impact of Time to Market on Customer Satisfaction.\" This paper examines how time-to-market (TTM) affects customer satisfaction, specifically for software companies that sell products or services with complex features. It uses data from customers who have purchased such items online over the past two years as part of their purchase decisions.\nThere are many factors impacting TTM including product complexity, communication quality, price transparency, seller reputation, and overall TTI (Time-To-Initiate). Each factor has been shown to be important in explaining why some customers choose one vendor's offering over another when given similar options. Additionally," which is irrelevant to paper 5151 deep weakly supervised anomaly detection. The same results happen to all the 20 samples I examined.

Is this the expected results? Or what could be the possible issues here?

guyuisland commented 4 months ago

Hello, the file summary_embeddings.csv contains summaries obtained using the Producer, not the final predicted text. Additionally, the file includes summaries for 100 papers as cases. If you need to train and acquire checkpoint_0.pth on your own, you must first use the Producer to generate summaries for all the papers, and then proceed with training using the Translator.

LITONG99 commented 4 months ago

Thank you for your timely reply!

Please let me clarify. I am using the provided checkpoint_0.pth to test the performance, so I can skip the training according to Readme.md. And I didn't misinterpret the summary_embeddings.csv, which is the Producer outputs and of good quality, as the prediction. I just sampled its first 20 rows, which provide the Graphsage pre-trained node embeddings of 20 nodes. It seems that in generate.py, the trained translator will take in pre-trained node embeddings and translate them into the corresponding soft prompts and generate responses.

For the attached texts, it is the output response0 of line 264, response0 = self.chatglm2_tokenizer.decode(outputs_i) response0 = self.chatglm2_model.process_response(response0)

Since in the eval.py the legacity rate and accuracy is NOT assessed based on the direct output response0, I conduct human evaluation on response0. I am expecting similar results as in Figure 3 of the paper (human evaluation of Taobao dataset), but my results on Arxiv is bad: The response is irrelevant to the paper.

guyuisland commented 4 months ago

The output of response0 in the ArXiv dataset is indeed not quite as expected, an issue that was identified during the experimental stage. There might be two reasons for this: 1. In the ArXiv dataset, when training data is constructed using Prodecer, the summary of the neighbors, which consists of cited references with diverse themes and fields, might not always be relevant to the original paper, thus leading to an unexpected output for response0. 2. The volume of training data is different; there is less training data from ArXiv compared to Taobao, and the Taobao dataset contains a finite number of products with a higher prevalence of common broad categories, which allows for more accurate responses.

LITONG99 commented 4 months ago

Thank you for the insightful comments. I agree with the reasons.

Since the Arxiv dataset does not reflect the paper's main results, would you please consider making the other dataset, Taobao, public? At least the trained model and some data for a demo will be very appreciated.

fs302 commented 4 months ago

@LITONG99 We unfortunately cannot share the Taobao-related dataset due to data privacy concerns. However, we are open to discuss on research problems with the community. GraphTranslator is a framework that combines Graph Neural Networks (GNN) and Large Language Models (LLM) from a technical perspective. We eagerly anticipate further experimentation pertaining to text-attributed graphs.