google-research / l2p

Learning to Prompt (L2P) for Continual Learning @ CVPR22 and DualPrompt: Complementary Prompting for Rehearsal-free Continual Learning @ ECCV22
https://arxiv.org/pdf/2112.08654.pdf
Apache License 2.0
403 stars 41 forks source link

Questions about domain-incremental setting, positional embedding and location of prompt #33

Open JH-LEE-KR opened 1 year ago

JH-LEE-KR commented 1 year ago

Dear author,

Thank you for your great work.

There are some questions while reproducing the official code.

From what I understand, the key of the L2P is to freeze a well-pretrained backbone (ViT) and train only small-sized prompts to achieve amazing performance.

However, if you look at the config in the domain increment setting using CORe50, the freeze part is an empty list. When reproduced without any config modification in my environment, I got results (77.91%) similar to the paper. According to the results, it is expected that full tuning without freezing of the backbone will be the result of the paper.

**1. Why didn't you freeze the backbone in the domain-incremental setting?

  1. Was it written in the paper? I also read the supplementary and didn't see anything about it. Trivial question. Only 99% of the samples of the entire CORe50 dataset are used because the subsmaple_rate is -1 in this part (test, train). 3. Is this the intended implementation?**

And about positional embedding, Before the release of the code version integrated with DualPrompt, the positional embedding was also added to prompts in L2P. However, in the version of code that is integrated with DualPrompt, the positional embedding is no longer added to prompts (only added to image tokens) in L2P. I think positional embeding will have a great impact on performance. 4. Which is the right?

Additionally, when using L2P in code integrated with DualPrompt, Encoders have the input as [Prompts, CLS, Image tokens]. But the code before the integration with DualPrompt is [CLS, Prompts, Image tokens]. 5. which one is correct?

Please let me know if there is anything I missed.

Best, Jaeho Lee.

jcy132 commented 1 year ago

How was the result for the CORe50 with frozen parts? (Question 1)

JH-LEE-KR commented 1 year ago

All experiments (CIFAR100, ImageNet-R, 5-datasets) used Adam as the optimizer, but CORe50 also used SGD, so I conduct experiments with different freeze parts and optimizer on official Jax code.

Here are the results of experiments on my environment Freeze Optimizer Acc@1
Yes Adam 75.06
Yes SGD 63.75
No Adam 18.07
No SGD 77.90

Freeze "Yes" means freeze same as CIFAR100 setting, config.freeze_part = ["encoder", "embedding", "cls"], "No"means config.freeze_part=[] CORe50 config. And No & SGD (last row) setting is the same as the original CORe50 config.

If you have any additional comments, please feel free to let me know.

Best, Jaeho Lee.

prachigarg23 commented 1 year ago

Hi @JH-LEE-KR , did you find an answer to the positional encoding question? I'm implementing L2P on another pre-trained backbone and wondering if position encoding is to be applied, where is it to be applied (before concatenating the prompts or after) and whether I should apply it to only the image tokens or to the prompts as well?