Why does the paper say that only the parameters of the alignment projector are optimized, but in fact the parameters tuned also include the input embedding of the language model?

HKUDS / GraphGPT

[SIGIR'2024] "GraphGPT: Graph Instruction Tuning for Large Language Models"

https://arxiv.org/abs/2310.13023

Apache License 2.0

576 stars 53 forks source link

Why does the paper say that only the parameters of the alignment projector are optimized, but in fact the parameters tuned also include the input embedding of the language model? #6

Closed YerayL closed 11 months ago

YerayL commented 11 months ago

I found out from the number of tuned parameters in the paper and from the code that the tuned parameters also contain the input embeddings of the language model, which confused me. Also, could input embedding changes affect llama's own abilities and cause catastrophic forgetting? Thanks to the author for the great work, please help with the answer.

tjb-tech commented 11 months ago

Thanks for your attention on our GraphGPT! As for your question about the tuned parameters, we have explored it in the Sec 4.5 and we truly reduce the parameters by a factor of 50. Actually, updating the parameters of input embeddings is a common operation in Multimodal Language Model. For example, in the first stage of LLaVa (https://arxiv.org/pdf/2304.08485.pdf, oral paper in NeurIPS'23), it also update the parameters of input embeddings and without further clarifying in the paper. The rational reason for this operation could be align the graph tokens with the natural language tokens. And in Sec 4.3, we found that our GraphGPT could alleviate the catastrophic forgetting in traditional GNN model. And because we don't update the parameters of the base model, so the affect on its abilities could be limited. But your question is very interesting and could be the future direction. Hope that my answer is helpful for you! Thank you again for your supports on our GraphGPT!

YerayL commented 11 months ago

Thank you for your prompt response! I may not have explained my second question clearly. What I meant was, does GraphGPT excel only in certain graph-related downstream tasks (such as node classification or link prediction) when there are changes made to the input embedding? As you know, the original LLM had the ability to respond to any query using natural language. It would be wonderful if GraphGPT could still possess this capability.

tjb-tech commented 11 months ago

Thank you for your further explanations. Although the original LLM had the ability to respond to any query using natural language, but for graph-structral data, using natural language is still not a scalable way in our view. So we need a more effective and scalable method to align graphs with natural language. And that's our GraphGPT done. Actually, for any domain-specific LLM, e.g., Code LLM, Visual LLM, your "affect" on LLM may also happen. So your question tend to be a more general question, i.e., whether domain-specific LLM will lose the capability for other NLP tasks. To date, we have litttle experiments on the general NLP tasks but we believe that how to let GraphGPT possess the capability on general NLP tasks could be an interesting opportunity. Hope that my answer is helpful for you! Thank you again for your attention!