Vision-Kek / ABCDFSS

10 stars 0 forks source link

Question about training the adapter #1

Closed canglangzhige closed 8 months ago

canglangzhige commented 8 months ago

Thanks for your work! The core of your work is using the target domain data to train the adapter. May I ask what data are used for the training? Is it understandable that one support image, one support mask, and one query image were used for the 1-shot setting? And five sets of data are used under the 5-shot setting? For each category of the target domain, are such data picked for training? If so, how is it different from traditional supervised learning-based semantic segmentation? Isn't the fact that you train the adapter with the target domain data equivalent to the fact that the categories that need to be tested have already been used in training? However, the FSS presupposes that the test category did not appear during training. (您工作的核心是使用目标域数据来训练adapter。请问训练使用的是目标域的什么数据?是否可以理解为,1-shot 设置使用了目标域的一个支持图像、一个支持掩码和一个查询图像?而在 5 次拍摄设置中使用了五组数据?对于目标领域的每个类别,是否都挑选这样的数据进行训练? 如果是,它与传统的基于监督学习的语义分割有何不同?用目标域数据来训练适配器,不就等同于需要测试的类别已经在训练中使用过了吗?但是,FSS 的前提条件是测试类别在训练过程中没有出现过。)

Vision-Kek commented 8 months ago

Hi, only the data from the current test-episode is used, i.e. for 1-shot 1 query (image only) + 1 support (image+mask) and for 5-shot 1 query (image only) + 5 support (image+mask). No other class or image is seen. You can see it also like some transductive inferences as in RePRI or some few-shot fine-tuning,