Nardien / KARD

Official Code Repository for the paper "Knowledge-Augmented Reasoning Distillation for Small Language Models in Knowledge-intensive Tasks" (NeurIPS 2023).
MIT License
25 stars 1 forks source link

Problems when changing t5 model to other Causal Models #1

Open AlbusChen opened 1 month ago

AlbusChen commented 1 month ago

Hi,

I am trying to use this framework on causal models such as llama based models and other LLMs. For my case, I use Tinyllama and Pythia to replace the T5 model in the original pipeline (TinyLlama-1.1B-Chat-v1.0 and EleutherAI/pythia-1.4b).

However, after I replace the model and run through all the steps provided in the code, which is, using the reasoning from GPT to fine-tune a smaller model (in this case Tinyllama and Pythia) and also use the external knowledge from KB. The response of the fine-tuned model is not readable and performs badly. For example, in medqa dataset, using wikipedia KB, and the provided reranker, the distilled model response text like

"A correct: that5). ( answer5C is the to of answer is the A is root ( A:). C, - A

also: for:. C, :) is of: isC.: 1 C a ThereforeC =:),,, C"

which is not a readable sentence and definitely fail in this task. I want to know why this happen and hope you can give me some possible explanations.

Other details:

Nardien commented 1 month ago

Thank you for your interest in our work! Currently, our code supports only T5-based models. As a result, causal models may not function correctly. Implementing support for causal models requires different data preprocessing and generation codes, which are not yet included in our repository. We will update our repository as soon as possible to include these features. Thank you for your understanding and patience!