Mark-Sky / KCL

Implement of 'The Devil is in the Few Shots: Iterative Visual Knowledge Completion for Few-shot Learning'
13 stars 2 forks source link

Questions about the Assumption of Test Sample Visibility #3

Closed zhaihaotian closed 1 week ago

zhaihaotian commented 1 week ago

Hello, thank you for sharing this interesting work and for open-sourcing the code! While reading the paper, I have some questions regarding the assumption of test sample visibility in the KCL method:

In the paper, the KCL method assumes that all test samples $f$ can be accessed at once, allowing the model to iteratively select high-confidence samples to supplement the few-shot knowledge. However, in practical scenarios, most mainstream few-shot learning methods or test-time fine-tuning methods typically restrict the model to process test samples sequentially, without granting access to the entire test dataset at once.

This assumption seems to diverge from the common streaming scenarios in few-shot learning or test-time fine-tuning. Having full access to all test samples undoubtedly provides the advantage of leveraging global information, which may significantly boost classification accuracy. Based on this, I have the following specific questions:

  1. Reasonableness of the Testing Scenario: Should models be allowed to access all test samples at once under practical few-shot testing protocols?
  2. Source of Performance Gains: If the model can access all test samples at once, does the improved accuracy primarily come from this assumption? Have you considered adapting the KCL method to streaming test scenarios?

Looking forward to your response! Thank you again for your contributions to this field and for open-sourcing the code!

Mark-Sky commented 1 week ago

Hello, thank you for your interest in our work and for taking the time to read the paper! Let me provide some clarification.

  1. Should models be allowed to access all test samples at once under practical few-shot testing protocols? This is a common practice in transductive learning, which is proposed for few-shot learning. For example, in [1], the entire query set is utilized to construct a graph, enabling label propagation.
  2. If the model can access all test samples at once, does the improved accuracy primarily come from this assumption? The performance improvement of KCL does indeed benefit from the nature of transductive learning, which allows access to the distribution of the test set. In the updated version of our paper, we have included an ablation study comparing KCL with other transductive learning methods[1,2,3,4], all of which also have access to the test set distribution.
  3. Have you considered adapting the KCL method to streaming test scenarios? We have not yet explored applying the KCL method to streaming test scenarios, but it could be an interesting direction for future work. Adapting KCL to such settings might require developing additional mechanisms. [1]Learning to propagate labels: Transductive propagation network for few-shot learning [2]Transductive information maximization for few-shot learning [3]Prototype rectification for few-shot learning [4]Boosting vision-language models with transduction
zhaihaotian commented 1 week ago

Thank you for your detailed response and clarification!

It seems that your experimental setup is based on transductive learning, which makes perfect sense in this context. My earlier question likely stemmed from my limited familiarity with this specific domain. The few-shot fine-tuning or test-time adaptation methods I’ve been reviewing recently primarily rely on historical information from test samples, such as [1] and [2].

Including a comparison with other transductive learning methods in the updated version of your paper would indeed make your approach even more comprehensible and compelling. Such a comparison would not only highlight the unique strengths of KCL but also position it more clearly within the broader landscape of transductive methods.

Thank you again for addressing my questions so thoroughly. I look forward to seeing more exciting developments from your future work!

[1]: Dual Memory Networks: A Versatile Adaptation Approach for Vision-Language Models [2]: Dual Prototype Evolving for Test-Time Generalization of Vision-Language Models