Possible information leakage from pretrained model

BinahHu commented 2 years ago

Dear author,

Thank you for your excellent work!

I am a little curious about the pretrained model, it is trained on the entire ImageNet-21k dataset, and is fixed during training. But will this lead to information leakage?

Take the class incremental setting as an example, I think all 100 classes of CIFAR100 can be found in ImageNet-21k so it is possible that the model has already learned all the features necessary for CIFAR100. But in practice, the model is expected to learn new features. We can not assume the classes in new tasks have already been observed by the backbone, right?

Have you tried to remove CIFAR100 classes in ImageNet and pretrained a model or evaluate the model on some datasets disjoint with ImageNet?

Thank you very much!

JoyHuYY1412 commented 2 years ago

Dear author,

Thank you for your excellent work!

I am a little curious about the pretrained model, it is trained on the entire ImageNet-21k dataset, and is fixed during training. But will this lead to information leakage?

Take the class incremental setting as an example, I think all 100 classes of CIFAR100 can be found in ImageNet-21k so it is possible that the model has already learned all the features necessary for CIFAR100. But in practice, the model is expected to learn new features. We can not assume the classes in new tasks have already been observed by the backbone, right?

Have you tried to remove CIFAR100 classes in ImageNet and pretrained a model or evaluate the model on some datasets disjoint with ImageNet?

Thank you very much!

I am also very curious about the pre-train part as previous incremental baselines train from the scratch by default.

KingSpencer commented 2 years ago

Great insight!

Actually we have not tried your suggested experiments, but it is definitely something worth trying. Regarding the "information leakage", I think we do make the assumption that we have a "well-pretrained" model, and we use the same pretrained model for all competitors, so the comparison is actually fair. Another thing I would like to highlight is that the idea of prompting is actually leveraging learned knowledge in the model, and trying to "instruct" the model to selectively use learned knowledge for coming tasks. Since large-scale pretrained model is prevalent these days, leveraging them is quite natural.

On the other hand, thinking about the extreme case that the pretrained model is totally off (e.g. trained on a totally different dataset, though we will not do it in practice), L2P will probably fail if the backbone is frozen. Thus, it will be interesting to see how and when to adapt the model backbone as a future direction.

Thanks again for your question and suggestion!

Best, Zifeng

google-research / l2p

Possible information leakage from pretrained model #11