GeorgeVern / lmcor

Code for the EACL 2024 paper: "Small Language Models Improve Giants by Rewriting Their Outputs"
8 stars 1 forks source link

[Question] How is the training dataset constructed? #1

Closed zmsn-2077 closed 6 months ago

zmsn-2077 commented 6 months ago

Dear Giorgos @GeorgeVern , As shown in Equation 2, what does this $y$ and $\hat{y}$ specifically represent? And how is it specifically constructed in the training dataset? Does it represent merely In-context Learning?

image
GeorgeVern commented 6 months ago

I appreciate your interest in our work! Equation 2 outlines the functionality of LMCor, which generates the most probable sequence $\hat{y}$ given both the input $x$ and a list of candidates $C$. As for the training set, it's constructed by sampling multiple candidates for each input. This is achieved by prompting an LLM for the task at hand. If you have any further questions or need clarification, feel free to ask!

zmsn-2077 commented 6 months ago

Thank you very much for answering my questions; this work is meaningful. I understand that LMCor uses the different candidate outputs of a front model (i.e., a list of candidates $C$) for in-context learning, and then obtains a better answer for an input $x$.

Does Equation 2 represent the reasoning process of LMCor, and is the prompt used in the reasoning process are shown in Appendix A?

image
GeorgeVern commented 6 months ago

Let me try to clarify the two-step process of our approach:

Step 1: Generating the Candidates

In this step, we interact with an LLM, typically through an API. Our goal is to acquire a set of candidate outputs, denoted as $C$, corresponding to a given input $x$. To achieve this, we prompt the LLM with a task description $d$ and optionally provide in-context examples $d$ (note that the examples can be omitted for zero-shot prompting). In our work, we use 5 in-context examples drawn from the development set of the respective task. To generate multiple candidates, we use a temperature with other sampling techniques, such as nucleus or top-k sampling.

Step 2: Correcting the Candidates

In this next step, we provide the previously generated candidates $C$, along with the source sentence $x$, to the small LM-Corrector, aka LMCor. This model then refines the candidates, producing the final answer. The corrector does not perform in-context learning; instead, it is trained specifically to correct the outputs of an LLM on a particular task.

To train the LMCor, which is a fine-tuned T5 model, we augment the existing task-specific dataset with candidates obtained from the LLM. This entails repeating Step 1 for each training point. Subsequently, we train the corrector to produce the target sentence given the input and the candidates.

During inference, we feed the inputs to the LLM initially, obtain the candidates, and then forward them to the corrector for refinement.

I hope this explanation helps!

zmsn-2077 commented 6 months ago

I get it. Thank you very much for answering my questions. I understand what the paper says,

LMCOR learns to select the most promising among the generated outputs.

This is very interesting!