Closed YizhuoQ closed 1 month ago
您好.
直接通过Prompting进行测试. 脚本已开源,请参考main_vg.py.
您好,非常抱歉这么晚才回复。关于上面VG任务的精度评价问题,我还是没有彻底搞明白。我参考main_vg.py
和#issue18中您给出的raw prediction result dior_rsvg_eval_save_file.json写了一段验证脚本, 但是输出结果仅为85.36,似乎和Table4中的结果有些出入。我不太明白这是怎么回事,是由于测试时模型输出的随机性造成的吗?还想请大佬答疑解惑。下面是我的验证脚本和输出:
import re
import json
import logging
def calculate_iou(box1, box2):
"""
Calculate IoU between two horizontal bounding boxes (HBB).
"""
x1, y1, x2, y2 = box1
x3, y3, x4, y4 = box2
intersection_x1 = max(x1, x3)
intersection_y1 = max(y1, y3)
intersection_x2 = min(x2, x4)
intersection_y2 = min(y2, y4)
intersection_area = max(0, intersection_x2 - intersection_x1 + 1) * max(
0, intersection_y2 - intersection_y1 + 1
)
box1_area = (x2 - x1 + 1) * (y2 - y1 + 1)
box2_area = (x4 - x3 + 1) * (y4 - y3 + 1)
union_area = box1_area + box2_area - intersection_area
iou = intersection_area / union_area
return iou
if __name__ == "__main__":
answers_file = 'E:/Code/dior_rsvg_eval_save_file.json'
pattern = r"\[([0-9., ]+)\]"
with open(answers_file) as f:
predictions = json.load(f)
parse_result = []
fail_instance = 0
for item in predictions:
pred_match = re.findall(pattern, item["pred"])
if len(pred_match) == 0:
fail_instance += 1
try:
pred_result = [list(map(float, match.split(","))) for match in pred_match]
except:
fail_instance += 1
continue
target_match = re.findall(pattern, item["target"])
target_result = [list(map(float, match.split(","))) for match in target_match]
new_pred_result = []
new_target_result = []
for pred, target in zip(pred_result, target_result):
if len(pred) == 4:
new_pred_result.append(pred)
new_target_result.append(target)
elif len(pred) > 4:
while len(pred) != 4:
pred.pop()
new_pred_result.append(pred)
new_target_result.append(target)
else:
fail_instance += 1
if len(new_pred_result) > 0:
parse_result.append(
dict(
filename=item["filename"],
pred=new_pred_result,
target=new_target_result,
)
)
count = 0
total = 0
for item in parse_result:
preds = item["pred"]
targets = item["target"]
for pred, target in zip(preds, targets):
iou_score = calculate_iou(pred, target)
if iou_score > 0.5:
count += 1
total += 1
print(f"Accuracy: {count / total * 100:.2f}%")
print(f"Fail Sample: {fail_instance}")
print(f"Accuracy With Fail Sample: {count / (total + fail_instance) * 100:.2f}%")
输出结果:
Accuracy: 85.38579458354624
Fail Sample: 0
Accuracy With Fail Sample: 85.38579458354624
已成功使用Stage3 checkpoint和LHRS RSVG Test Data复现了论文中Table 4给出的结果,感谢开源分享~ 完整输出如下:
[2024-08-26 15:00:10,020] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] async_io: please install the libaio-dev package with apt
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
Not using distributed mode.
accelerator: gpu
adjust_norm: false
alignment_dim: 768
batch_size: 1
bf16: true
bits: 16
config: null
data_path: /datasets/DIOR-RSVG/Images
data_target: /workspace/mllm-code/eval-vg/data/LHRS_Data/RSVG/Test/RSVG_DIOR_test.json
double_quant: true
dtype: float16
enable_amp: true
entity: pumpkinn
epochs: 2
eval:
dataset: AID
fp16: false
generate: false
gpus: 0
inf_sampler: false
is_distribute: false
local_rank: 0
lora:
enable: false
lora_alpha: 256
lora_bias: none
lora_dropout: 0.05
lora_r: 128
lr: 0.0002
max_grad_norm: 0.3
model_path: checkpoint/stage3/FINAL.pt
optimizer: adanp
opts: null
output: output
project: MaskIndexNet
prompt_template: llava_llama_2
quant_type: nf4
rank: 0
rgb_vision:
arch: vit_large
attn_pooler:
num_attn_heads: 16
num_layers: 6
num_query: 144
input_patchnorm: false
input_size:
- 224
- 224
patch_dropout: 0.0
tune_pooler: true
vit_name: openai/clip-vit-large-patch14
sar_vision:
activate: sigmoid
alpha: 0.2
arch: base
branch_temp: 0.07
decoder:
heads: 12
hidden_size: 768
layers: 12
mask_color: mean
mask_ratio: 0.6
focal_gamma: 1.0
in_chans: 2
input_size:
- 192
- 192
loss_weight: 1.0
n_queries: 256
online_temp: 0.1
reduction: none
residual: false
unmask_weight: 0.0
warmup_branch_temp: 0.04
warmup_branch_temp_epochs: 2
schedule:
decay_epochs: 30
decay_rate: 0.1
gamma: 0.1
min_lr: 2.0e-05
multisteps: []
name: cosine
warmup_epochs: 100
warmup_factor: 0.01
warmup_method: linear
seed: 322
stage: 0
text:
bos_token_id: 1
eos_token_id: 2
hidden_act: silu
hidden_size: 4096
initializer_range: 0.02
intermediate_size: 11008
max_position_embeddings: 2048
num_attention_heads: 32
num_hidden_layers: 32
pad_token_id: 0
path: /huggingface/models/Llama-2-7b-chat-hf
rms_norm_eps: 1e-5
tie_word_embeddings: false
use_cache: true
vocab_size: 32000
transform:
input_size:
- 224
- 224
rand_aug: rand-m5-n2-mstd0.5-inc1
tune_im_patch: false
tune_im_start: false
tune_rgb_bk: false
tune_rgb_pooler: false
use_checkpoint: false
wandb: false
wd: 0.0
workers: 2
world_size: 1
[08/26 15:00:13 train]: Full config saved to output/config.json
[08/26 15:00:13 train]: accelerator: gpu
adjust_norm: false
alignment_dim: 768
batch_size: 1
bf16: true
bits: 16
config: null
data_path: /datasets/DIOR-RSVG/Images
data_target: /workspace/mllm-code/eval-vg/data/LHRS_Data/RSVG/Test/RSVG_DIOR_test.json
double_quant: true
dtype: float16
enable_amp: true
entity: pumpkinn
epochs: 2
eval:
dataset: AID
fp16: false
generate: false
gpus: 0
inf_sampler: false
is_distribute: false
local_rank: 0
lora:
enable: false
lora_alpha: 256
lora_bias: none
lora_dropout: 0.05
lora_r: 128
lr: 0.0002
max_grad_norm: 0.3
model_path: checkpoint/stage3/FINAL.pt
optimizer: adanp
opts: null
output: output
project: MaskIndexNet
prompt_template: llava_llama_2
quant_type: nf4
rank: 0
rgb_vision:
arch: vit_large
attn_pooler:
num_attn_heads: 16
num_layers: 6
num_query: 144
input_patchnorm: false
input_size:
- 224
- 224
patch_dropout: 0.0
tune_pooler: true
vit_name: openai/clip-vit-large-patch14
sar_vision:
activate: sigmoid
alpha: 0.2
arch: base
branch_temp: 0.07
decoder:
heads: 12
hidden_size: 768
layers: 12
mask_color: mean
mask_ratio: 0.6
focal_gamma: 1.0
in_chans: 2
input_size:
- 192
- 192
loss_weight: 1.0
n_queries: 256
online_temp: 0.1
reduction: none
residual: false
unmask_weight: 0.0
warmup_branch_temp: 0.04
warmup_branch_temp_epochs: 2
schedule:
decay_epochs: 30
decay_rate: 0.1
gamma: 0.1
min_lr: 2.0e-05
multisteps: []
name: cosine
warmup_epochs: 100
warmup_factor: 0.01
warmup_method: linear
seed: 322
stage: 0
text:
bos_token_id: 1
eos_token_id: 2
hidden_act: silu
hidden_size: 4096
initializer_range: 0.02
intermediate_size: 11008
max_position_embeddings: 2048
num_attention_heads: 32
num_hidden_layers: 32
pad_token_id: 0
path: /huggingface/models/Llama-2-7b-chat-hf
rms_norm_eps: 1e-5
tie_word_embeddings: false
use_cache: true
vocab_size: 32000
transform:
input_size:
- 224
- 224
rand_aug: rand-m5-n2-mstd0.5-inc1
tune_im_patch: false
tune_im_start: false
tune_rgb_bk: false
tune_rgb_pooler: false
use_checkpoint: false
wandb: false
wd: 0.0
workers: 2
world_size: 1
[08/26 15:00:13 train]: Creating model
/opt/conda/envs/lhrs/lib/python3.10/site-packages/huggingface_hub/file_download.py:1150: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████| 2/2 [00:04<00:00, 2.03s/it]
3372
[08/26 15:00:20 train]: Data Length: 3372
[08/26 15:00:20 train]: Loading pretrained checkpoint from checkpoint/stage3/FINAL.pt
[08/26 15:00:20 train]: Loading RGB encoder.
[08/26 15:00:20 train]: After loading RGB encoder: Missing: []. Unexpected: []
[08/26 15:00:20 train]: Loadding LoRA parameters.
Evaluating: 100%|████████████████████████████████████████████████████████████████| 3.37k/3.37k [1:59:53<00:00, 2.13s/it]
[08/26 17:00:19 train]: result file saved to output/eval_save_file.json
[08/26 17:00:19 train]: Count: 6211
[08/26 17:00:19 train]: Total: 6973
[08/26 17:00:19 train]: Accuracy: 89.07213537932024
[08/26 17:00:19 train]: Fail Sample: 8
[08/26 17:00:19 train]: Accuracy With Fail Sample: 88.97006159575992
请问在计算iou的时候,函数calculate_iou中为什么要在:
intersection_area = max(0, intersection_x2 - intersection_x1 + 1) * max(
0, intersection_y2 - intersection_y1 + 1
)
box1_area = (x2 - x1 + 1) * (y2 - y1 + 1)
box2_area = (x4 - x3 + 1) * (y4 - y3 + 1)
进行+1呢?传入的box不是已经归一化了吗?+1之后不就导致计算结果错误了吗
请问在计算iou的时候,函数calculate_iou中为什么要在:
intersection_area = max(0, intersection_x2 - intersection_x1 + 1) * max( 0, intersection_y2 - intersection_y1 + 1 ) box1_area = (x2 - x1 + 1) * (y2 - y1 + 1) box2_area = (x4 - x3 + 1) * (y4 - y3 + 1)
进行+1呢?传入的box不是已经归一化了吗?+1之后不就导致计算结果错误了吗
你好,代码中的calculate_iou
函数参考自main_vg.py
您好,感谢指出,请参考#27.
请问在计算iou的时候,函数calculate_iou中为什么要在:
intersection_area = max(0, intersection_x2 - intersection_x1 + 1) * max( 0, intersection_y2 - intersection_y1 + 1 ) box1_area = (x2 - x1 + 1) * (y2 - y1 + 1) box2_area = (x4 - x3 + 1) * (y4 - y3 + 1)
进行+1呢?传入的box不是已经归一化了吗?+1之后不就导致计算结果错误了吗
感谢您指出这个错误,感谢作者的回复。
请问文章中表4在测试Qwen-VL-Chat和MiniGPTv2的定位精度时,是使用的官方发布的预训练模型还是把模型在两个数据集上又分别进行了微调?另外,表4中Visual Grounding精度计算的脚本有开源吗?