LeapLabTHU / Agent-Attention

Official repository of Agent Attention (ECCV2024)
473 stars 35 forks source link

A small bug in agent_pvt.py #7

Closed lyd126 closed 8 months ago

lyd126 commented 8 months ago

作者您好,非常感谢您能分享如此有意思的工作。 我在复现您工作时在agent_pvt.py中发现了个小问题:如果self.sr_ratio>1,则在134行后,放缩过后的k和v的维度应该是q的self.sr_ratio平方分之1,而在144-146行中,qkv采用了相同的reshape维度,这里kv的reshape操作可能会有问题。我注意到您使用sr_ratio=sr_ratios[i] if attn_type[i] == 'B' else int(agent_sr_ratios[i])在agentattn下强制sr_ratio=1,那么当sr_ratio不等1时该如何处理呢?还是说agentattn不支持kv的放缩? 感谢您的回复

tian-qing001 commented 8 months ago

Hi @lyd126, thank you for bringing this to our attention. We've addressed the bug related to cases where sr_ratio>1. While the bug has been fixed, it's important to note that setting sr_ratio to values greater than 1 may lead to a significant decrease in model performance. Therefore, we strongly recommend setting sr_ratio=1 for optimal results.

lyd126 commented 8 months ago

非常感谢您的回复,问题得到了完美的解决。另外我还有一个问题想请教您一下(为了表述的更清楚我还是用中文进行描述):在agentattn初始化过程中涉及到num_patches参数的初始化,这个参数是由253行patch_embed提供的。那么当训练和测试patch个数不同该如何处理呢,比如训练图片经过增强尺寸大于未经过增强的测试图片,导致test的num_patches不等于train的num_patches,会导致程序报错。同时这个参数是在初始化就传入的,所以导致在test情况下也无法进行调整。想知道这个问题除了保持train和test的num_patches参数相同这种处理办法外还有没其它的解决方法,使这个工作通用性更强?非常感谢您的回复

tian-qing001 commented 8 months ago

Hi @lyd126. In classification tasks, it's standard to employ interpolation or cropping methods to maintain consistent sizing across test and training images. If you wish to support variable-sized image input, consider adjusting relevant code and dynamically interpolating model parameters like pos_embed to adapt the model to current image sizes.

lyd126 commented 8 months ago

Thank you very much for your replay.