Closed minji-o-j closed 1 year ago
I have replied your questions in your email. Thanks for your questions!
@StevenTang1998 Hello, I'd like to ask you additional questions separately from the mail.
What is learned in the PTG training process?
When I first read the paper, I thought it was 1,
I'm confused because the code says self.model.requires_grad_(True)
and the paper says the learning rate of "BART".
Could you give me an answer as to which one is correct?
Hi, @minji-o-j during prompt pre-training, we only train the query and keys. When fine-tuning on the downstream tasks, we tune the prompts and the BART model. More details can be found Page 6 in our paper.
Then, is the process of learning a PTG for a specific task a two-stage process?
(1) In the process of obtaining tilde p, query and keys are trained using the Frozen BART model. (2) Fine-tuning BART using tilde p obtained in stage (1)
Yes, and the (1) is optional if you use existing trained prompts.
If so, it is impossible to obtain the paper's experimental results immediately by executing the following command in the current code, and is it correct to reproduce it through minor modifications?
python run_textbox.py --model=PTG --dataset=cnndm --model_path=facebook/bart-large
(I used the command written here)
The reason for thinking so is that when learning queries and keys in PTG,
self.model.requires grad_
is set to True (https://github.com/RUCAIBox/TextBox/blob/2.0.0/textbox/model/ptg.py#L43).
As of now, BART learning + query and keys learning are done simultaneously.
After changing this part(`self.model.requiresgrad(True)) to False, learn the query key and save the tilde p, should I do BART fine-tuning for the same target task train set when learning again? (set self.model.requires grad_(True) and use fixed tilde p value instead of prompt_embedding metrics)
Please let me know if anything is wrong
You can obtain the paper's experimental results immediately by executing the following command:
python run_textbox.py --model=PTG --dataset=cnndm --model_path=facebook/bart-large
We have provided the pre-trained prompt source.
Then, is the process of learning a PTG for a specific task a two-stage process?
(1) In the process of obtaining tilde p, query and keys are trained using the Frozen BART model. (2) Fine-tuning BART using tilde p obtained in stage (1)
If so, is only (2) executed when this command is used?
yeah
Then, is the provided prompt source
not a source prompt for the source prompt pool,
but a tilde p for 14 tasks that have already completed learning (excluding itself) for 13 tasks?
However, looking at the code, it appears that the provided prompt goes into the source task.
It was understood that the source task is used in the process of obtaining tilde p.
Please let me know if there is anything wrong with my understanding!!
You can download it and utilize torch to load it. It contains the learned prompt for each task (i.e., 14 tensors of shape [200, 1024]).
Taking the pc dataset as an example, the source prompts for the same target task (pc) are different in cross-task and cross-dataset experiments.
In the case of the 14 prompts provided, are the tilde p that went through the process presented in the thesis for all source tasks (13) and the tilde p used in the experiment separate?
Sorry, I may not understand your question. Maybe you can find solution here. We have provided different options for source tasks.
Oh if so
The source prompt is derived using the Frozen BART model (multi-key memory network not used). Isn't tilde p obtained by utilizing "source prompts" and an adaptive attention mechanism?
Yes, the source prompt is derived using the Frozen BART model (multi-key memory network not used). And tilde p is obtained by utilizing "source prompts" and an adaptive attention mechanism
And it is my mistake. The prompt source we provided is the P = {p1, . . . , pt, . . . , pT }.
Then, is the process of learning a PTG for a specific task a two-stage process?
(1) In the process of obtaining tilde p, query and keys are trained using the Frozen BART model. (2) Fine-tuning BART using tilde p obtained in stage (1)
If so, I guess I need to start with (1) to train the PTG since the provided prompt source is the source prompt.
python run_textbox.py --model=PTG --dataset=cnndm --model_path=facebook/bart-large
However, using this command seems to train both BART and the prompt(query and keys) at the same time.
If so, it is impossible to obtain the paper's experimental results immediately by executing the following command in the current code, and is it correct to reproduce it through minor modifications?
python run_textbox.py --model=PTG --dataset=cnndm --model_path=facebook/bart-large
(I used the command written here)The reason for thinking so is that when learning queries and keys in PTG,
self.model.requires grad_
is set to True (https://github.com/RUCAIBox/TextBox/blob/2.0.0/textbox/model/ptg.py#L43). As of now, BART learning + query and keys learning are done simultaneously.After changing this part(`self.model.requiresgrad(True)) to False, learn the query key and save the tilde p, should I do BART fine-tuning for the same target task train set when learning again? (set self.model.requires grad_(True) and use fixed tilde p value instead of prompt_embedding metrics)
Please let me know if anything is wrong
So I asked like that, is it right to proceed with learning as I thought?
Any help would be appreciated.
If you want to conduct the (1) step. Our provided code hasn't supported that. Maybe you should modify existing code to achieve your goal.
Thank you for answer.
Also, in the current code, when an instance is entered, "task information" (ex: summarization) is also entered as the input of the model. (prompt + task description + input sentence)
In the paper, the "Cluster" key and "Prompt" key were used.
But in the current code, the same key
is passed to the MHA function. (link)
prompt_embeds = self.lam * self.MHA(task_query, key, value) + (1 - self.lam) * self.MHA(input_query, key, value)
Sorry for late response, we utilize the same key in practice.
In the paper, k^c_z (cluster key) and k^p_t (prompt key) exist, respectively, but the code uses the same key.