Closed zmsn-2077 closed 6 months ago
I appreciate your interest in our work! Equation 2 outlines the functionality of LMCor, which generates the most probable sequence $\hat{y}$ given both the input $x$ and a list of candidates $C$. As for the training set, it's constructed by sampling multiple candidates for each input. This is achieved by prompting an LLM for the task at hand. If you have any further questions or need clarification, feel free to ask!
Thank you very much for answering my questions; this work is meaningful. I understand that LMCor uses the different candidate outputs of a front model (i.e., a list of candidates $C$) for in-context learning, and then obtains a better answer for an input $x$.
Does Equation 2 represent the reasoning process of LMCor, and is the prompt used in the reasoning process are shown in Appendix A?
Let me try to clarify the two-step process of our approach:
In this step, we interact with an LLM, typically through an API. Our goal is to acquire a set of candidate outputs, denoted as $C$, corresponding to a given input $x$. To achieve this, we prompt the LLM with a task description $d$ and optionally provide in-context examples $d$ (note that the examples can be omitted for zero-shot prompting). In our work, we use 5 in-context examples drawn from the development set of the respective task. To generate multiple candidates, we use a temperature with other sampling techniques, such as nucleus or top-k sampling.
In this next step, we provide the previously generated candidates $C$, along with the source sentence $x$, to the small LM-Corrector, aka LMCor. This model then refines the candidates, producing the final answer. The corrector does not perform in-context learning; instead, it is trained specifically to correct the outputs of an LLM on a particular task.
To train the LMCor, which is a fine-tuned T5 model, we augment the existing task-specific dataset with candidates obtained from the LLM. This entails repeating Step 1 for each training point. Subsequently, we train the corrector to produce the target sentence given the input and the candidates.
During inference, we feed the inputs to the LLM initially, obtain the candidates, and then forward them to the corrector for refinement.
I hope this explanation helps!
I get it. Thank you very much for answering my questions. I understand what the paper says,
LMCOR learns to select the most promising among the generated outputs.
This is very interesting!
Dear Giorgos @GeorgeVern , As shown in Equation 2, what does this $y$ and $\hat{y}$ specifically represent? And how is it specifically constructed in the training dataset? Does it represent merely In-context Learning?