Closed kouyk closed 2 years ago
Hi, thank you for your interest in our work.
To add additional channel, you can modify function here: https://github.com/KMnP/vpt/blob/main/src/models/vit_prompt/vit.py#L75
More specifically, you can concat the prompt to x before x = self.embeddings(x)
.
And remember to modify the embedding layer's size due to the added channel. And you could use https://github.com/KMnP/vpt/blob/main/src/models/vit_models.py#L98 to leave the embedding layer unfrozen.
@kouyk Will close this issue due to no further comments. Feel free to reopen it if you have more questions!
Thank you for your good work, I like how the paper validated many ideas particularly for transformers.
I felt intrigued by some of the ablation studies as well, especially the various locations for the prompt. I was trying to replicate all of them but it didn't seem obvious to me how I should go about training the version with additional channel concatenated to the input image as well as having the embedding layer unfrozen. Could the team provide some advice on this?