Open H1NATA111 opened 5 months ago
Hi @H1NATA111, considering that the I-PoolingAttention and L2-norm are not efficient enough for TensorRT models, we have removed them for the new version (YOLO-World-V2). All pre-trained checkpoints have been released and we suggest you move to the latest version of YOLO-World.
Q1. How much role does I-Pooling play from the final performance perspective? Wouldn't the performance degrade if the I-pooling is dropped? Q2. Am I right that, without I-Pooling, the text features are untouched in Yolo-World? i.e. same as CLIP text embedding.
Based on the performance reported from readme, I-pooling doesn't seem to help. Please correct me if I am missing anything.
v2
v1
Hi @ljj7975, Adding I-PoolingAttention
is effective for pre-training with large-scale region-text pairs, it brings 0.5~1.5 AP improvements on LVIS minival evaluation.
The motivation for removing I-PoolingAttention
is that we find it hard to use in some deployment cases, especially for edge applications, though the latency is small.
The V2 and V1 have several differences, the I-PoolingAttention
, BatchNorm
in the contrastive head, and the training strategies.
The V2 version aims for practical applications and has been evaluated in different deployment scenarios.
感谢您的杰出工作! 在configs/pretrain/yolo_world_v2_l_vlpan_bn_2e-3_100e_4x8gpus_obj365v1_goldg_train_lvis_minival.py这个config文件下 我尝试通过断点了解您的模型, 但在训练过程中,模型并未调用ImagePoolingAttentionModule模块 即论文中提及的“Image-Pooling Attention”模块。 请问在什么情况下模型会调用这个模块来更新文本特征呢?