KMnP / vpt

❄️🔥 Visual Prompt Tuning [ECCV 2022] https://arxiv.org/abs/2203.12119
Other
1k stars 91 forks source link

Training with concat-channel method #8

Closed kouyk closed 2 years ago

kouyk commented 2 years ago

Thank you for your good work, I like how the paper validated many ideas particularly for transformers.

I felt intrigued by some of the ablation studies as well, especially the various locations for the prompt. I was trying to replicate all of them but it didn't seem obvious to me how I should go about training the version with additional channel concatenated to the input image as well as having the embedding layer unfrozen. Could the team provide some advice on this?

KMnP commented 2 years ago

Hi, thank you for your interest in our work.

To add additional channel, you can modify function here: https://github.com/KMnP/vpt/blob/main/src/models/vit_prompt/vit.py#L75 More specifically, you can concat the prompt to x before x = self.embeddings(x).

And remember to modify the embedding layer's size due to the added channel. And you could use https://github.com/KMnP/vpt/blob/main/src/models/vit_models.py#L98 to leave the embedding layer unfrozen.

KMnP commented 2 years ago

@kouyk Will close this issue due to no further comments. Feel free to reopen it if you have more questions!