google-research / prompt-tuning

Original Implementation of Prompt Tuning from Lester, et al, 2021
Apache License 2.0
636 stars 56 forks source link

Question about your paper and possible research topic? #196

Closed ArEnSc closed 2 years ago

ArEnSc commented 2 years ago

@blester125 Does soft prompt tuning imply that we can avoid catastrophic forgetting that occurs in multitasking? To do this using a classifier or a series of action codes to predict the next task to use? Let me know what you think! and what you have seen. I am still working through a way to do this with soft prompt tune with pytorch and experiment.

blester125 commented 2 years ago

Hi!

We has seen some successes in using Prompt Tuning as a way to mitigate catastrophic forgetting, for example in this paper we find Prompt Tuning can help in cross-lingual zero-shot generation where the models tendency to output English (after training on English) even when the input is in a different language is a type of over-fitting.

I don't exactly know what you mean by predicting the next task? Generally you know what task is being done so you can make a decision to swap out which prompt is being use? For example in the above paper we have "factored prompts" where there is a collection of language and a collection of task prompts that are combined for new (language, task) pairs. That isn't OSS yet but we have a similar method in the extended section of the code base where there is are shared prompt parameters that are used for all tasks and then task specific parameters. Which task specific parameters to use are selected based on the current task and are trained jointly with the shared parameters.

If you don't want to make a hard, explicit different between prompts used for other tasks you could probably use an approach like this paper where the prompt is a function of the input (although I think it would have made more sense if they had the instance specific prompt and a shared prompt that were mixed together with a learned gating). You could also do something like have a huge number of prompt tokens (ex 1000) and then do some sort of attention between the input and them to reduce/select them down to a reasonable number that is then used as the prompt. Something like this should allow you to have prompts that combine shared and task specific information (without you needing to know the task).

If you are having trouble implementing anything in this codebase let me know and I can try to help point you to how it would be done.

ArEnSc commented 2 years ago

@blester125 Thank you for the response and your and the team's amazing work. I am surprised there are not more frameworks implementing this as a staple of NLP! to clearify I am trying to develop an autonomous agent that will be able to take user queries and apply the correct prompt associated with their query, generally in this scenario there would be a higher level classifier that would guide the model to choose the correct prompt by predicting it from the query.
I am going to try this paper to see if this works, but it seems like a good potiential direction I was looking for https://homes.cs.washington.edu/~akari/papers/attempt_preprint.pdf I will also take a look at that paper you referenced!

blester125 commented 2 years ago

Thanks for sharing that reference! I'm glad someone finally did that, I've been thinking along the lines for that paper for a while but never had time!