Open JH-LEE-KR opened 1 year ago
How was the result for the CORe50 with frozen parts? (Question 1)
All experiments (CIFAR100, ImageNet-R, 5-datasets) used Adam as the optimizer, but CORe50 also used SGD, so I conduct experiments with different freeze parts and optimizer on official Jax code.
Here are the results of experiments on my environment | Freeze | Optimizer | Acc@1 |
---|---|---|---|
Yes | Adam | 75.06 | |
Yes | SGD | 63.75 | |
No | Adam | 18.07 | |
No | SGD | 77.90 |
Freeze "Yes" means freeze same as CIFAR100 setting, config.freeze_part = ["encoder", "embedding", "cls"]
,
"No"means config.freeze_part=[]
CORe50 config.
And No & SGD (last row) setting is the same as the original CORe50 config.
If you have any additional comments, please feel free to let me know.
Best, Jaeho Lee.
Hi @JH-LEE-KR , did you find an answer to the positional encoding question? I'm implementing L2P on another pre-trained backbone and wondering if position encoding is to be applied, where is it to be applied (before concatenating the prompts or after) and whether I should apply it to only the image tokens or to the prompts as well?
Dear author,
Thank you for your great work.
There are some questions while reproducing the official code.
From what I understand, the key of the L2P is to freeze a well-pretrained backbone (ViT) and train only small-sized prompts to achieve amazing performance.
However, if you look at the config in the domain increment setting using CORe50, the freeze part is an empty list. When reproduced without any config modification in my environment, I got results (77.91%) similar to the paper. According to the results, it is expected that full tuning without freezing of the backbone will be the result of the paper.
**1. Why didn't you freeze the backbone in the domain-incremental setting?
And about positional embedding, Before the release of the code version integrated with DualPrompt, the positional embedding was also added to prompts in L2P. However, in the version of code that is integrated with DualPrompt, the positional embedding is no longer added to prompts (only added to image tokens) in L2P. I think positional embeding will have a great impact on performance. 4. Which is the right?
Additionally, when using L2P in code integrated with DualPrompt, Encoders have the input as [Prompts, CLS, Image tokens]. But the code before the integration with DualPrompt is [CLS, Prompts, Image tokens]. 5. which one is correct?
Please let me know if there is anything I missed.
Best, Jaeho Lee.