NJU-LHRS / LHRS-Bot

VGI-Enhanced multimodal large language model for remote sensing images.
Apache License 2.0
81 stars 7 forks source link

A question about VG evaluation #18

Closed xuliu-cyber closed 2 weeks ago

xuliu-cyber commented 4 weeks ago

I Use the stage3 checkpoint for evaluation. But I got the visual grounding results different with the paper mentioned. Here are the log outputs I got with the accuracy of 28.75:

[06/27 10:52:52 train]: Full config saved to eval/vg/DIOR-RSVG/config.json [06/27 10:52:52 train]: accelerator: gpu adjust_norm: false alignment_dim: 768 batch_size: 1 bf16: true bits: 16 config: null data_path: /data/liux/DIOR-RSVG/JPEGImages data_target: /data/liux/DIOR-RSVG/test.json double_quant: true dtype: float16 enable_amp: true entity: pumpkinn epochs: 2 eval: dataset: AID fp16: false generate: false gpus: 0 inf_sampler: false is_distribute: false local_rank: 0 lora: enable: false lora_alpha: 256 lora_bias: none lora_dropout: 0.05 lora_r: 128 lr: 0.0002 max_grad_norm: 0.3 model_path: /data/liux/LHRS/Stage3/FINAL.pt optimizer: adanp opts: null output: eval/vg/DIOR-RSVG project: MaskIndexNet prompt_template: llava_llama_2 quant_type: nf4 rank: 0 rgb_vision: arch: vit_large attn_pooler: num_attn_heads: 16 num_layers: 6 num_query: 144 input_patchnorm: false input_size:

[06/27 10:52:52 train]: Creating model [06/27 10:53:56 train]: Data Length: 7500 [06/27 10:53:56 train]: Loading pretrained checkpoint from /data/liux/LHRS/Stage3/FINAL.pt [06/27 10:53:57 train]: Loading RGB encoder. [06/27 10:53:57 train]: After loading RGB encoder: Missing: []. Unexpected: [] [06/27 10:53:57 train]: Loadding LoRA parameters. [06/27 12:06:29 train]: result file saved to eval/vg/DIOR-RSVG/eval_save_file.json [06/27 12:06:29 train]: Accuracy: 28.75025651549354 [06/27 12:06:29 train]: Fail Sample: 1229 [06/27 12:06:29 train]: Accuracy With Fail Sample: 22.959685349065882

pUmpKin-Co commented 4 weeks ago

Hi~ Thanks for you interest.

We just test over on the test set by converting the test set into instruction format. The test set can be found at here.

Below the result obtained from one evaluation run.

RSVG

[01/18 16:38:23 train]: Full config saved to ../../Output/LHRS/stage3/zero_shot_vg/rsvg/config.json [01/18 16:38:23 train]: accelerator: gpu adjust_norm: false alignment_dim: 768 batch_size: 1 bf16: true bits: 16 config: null data_path: /home/aiscuser/pumpkin_dataset/InstructDataset/RSVG_Image data_target: /home/aiscuser/pumpkin_dataset/Eval/VGEvalDataset/RSVG_test.json double_quant: true dtype: float16 enable_amp: true entity: pumpkinn epochs: 2 eval: dataset: AID fp16: false generate: false gpus: 0 inf_sampler: false is_distribute: false local_rank: 0 lora: enable: false lora_alpha: 16 lora_bias: none lora_dropout: 0.05 lora_r: 8 lr: 0.0003 max_grad_norm: 1.0 model_path: ../../Output/LHRS/stage3/checkpoints/FINAL.pt optimizer: adanp opts: null output: ../../Output/LHRS/stage3/zero_shot_vg/rsvg project: MaskIndexNet prompt_template: llava_llama_2 quant_type: nf4 rank: 0 rgb_vision: arch: vit_large attn_pooler: num_attn_heads: 16 num_layers: 6 num_query: 144 input_patchnorm: false input_size: - 224 - 224 patch_dropout: 0.0 tune_pooler: true vit_name: openai/clip-vit-large-patch14 sar_vision: activate: sigmoid alpha: 0.2 arch: base branch_temp: 0.07 decoder: heads: 12 hidden_size: 768 layers: 12 mask_color: mean mask_ratio: 0.6 focal_gamma: 1.0 in_chans: 2 input_size: - 192 - 192 loss_weight: 1.0 n_queries: 256 online_temp: 0.1 reduction: none residual: false unmask_weight: 0.0 warmup_branch_temp: 0.04 warmup_branch_temp_epochs: 2 schedule: decay_epochs: 30 decay_rate: 0.1 gamma: 0.1 min_lr: 2.0e-05 multisteps: [] name: cosine warmup_epochs: 100 warmup_factor: 0.01 warmup_method: linear seed: 322 stage: 2 text: bos_token_id: 1 eos_token_id: 2 hidden_act: silu hidden_size: 4096 initializer_range: 0.02 intermediate_size: 11008 max_position_embeddings: 2048 num_attention_heads: 32 num_hidden_layers: 32 pad_token_id: 0 path: /home/aiscuser/.cache/huggingface/hub/models--meta-llama--Llama-2-7b-chat-hf/snapshots/c1b0db933684edbfe29a06fa47eb19cc48025e93 rms_norm_eps: 1e-5 tie_word_embeddings: false use_cache: true vocab_size: 32000 transform: input_size: - 224 - 224 rand_aug: rand-m5-n2-mstd0.5-inc1 tune_im_patch: false tune_im_start: false tune_rgb_bk: false tune_rgb_pooler: false use_checkpoint: false wandb: false wd: 0.02 workers: 4 world_size: 1 [01/18 16:38:23 train]: Creating model [01/18 16:38:33 train]: Data Length: 1227 [01/18 16:38:33 train]: Loading pretrained checkpoint from ../../Output/LHRS/stage3/checkpoints/FINAL.pt [01/18 16:38:34 train]: Loading RGB encoder. [01/18 16:38:35 train]: After loading RGB encoder: Missing: []. Unexpected: [] [01/18 16:38:35 train]: Loadding LoRA parameters. [01/18 17:00:58 train]: result file saved to ../../Output/LHRS/stage3/zero_shot_vg/rsvg/eval_save_file.json [01/18 17:00:58 train]: Accuracy: 71.94851330203444 [01/18 17:00:58 train]: Fail Sample: 0 [01/18 17:00:58 train]: Accuracy With Fail Sample: 71.94851330203444

DIOR-RSVG

[01/18 17:01:19 train]: Full config saved to ../../Output/LHRS/stage3/zero_shot_vg/rsvg_dior/config.json [01/18 17:01:19 train]: accelerator: gpu adjust_norm: false alignment_dim: 768 batch_size: 1 bf16: true bits: 16 config: null data_path: /home/aiscuser/pumpkin_dataset/InstructDataset/RSVG_DIOR_Image data_target: /home/aiscuser/pumpkin_dataset/Eval/VGEvalDataset/RSVG_DIOR_test.json double_quant: true dtype: float16 enable_amp: true entity: pumpkinn epochs: 2 eval: dataset: AID fp16: false generate: false gpus: 0 inf_sampler: false is_distribute: false local_rank: 0 lora: enable: false lora_alpha: 16 lora_bias: none lora_dropout: 0.05 lora_r: 8 lr: 0.0003 max_grad_norm: 1.0 model_path: ../../Output/LHRS/stage3/checkpoints/FINAL.pt optimizer: adanp opts: null output: ../../Output/LHRS/stage3/zero_shot_vg/rsvg_dior project: MaskIndexNet prompt_template: llava_llama_2 quant_type: nf4 rank: 0 rgb_vision: arch: vit_large attn_pooler: num_attn_heads: 16 num_layers: 6 num_query: 144 input_patchnorm: false input_size: - 224 - 224 patch_dropout: 0.0 tune_pooler: true vit_name: openai/clip-vit-large-patch14 sar_vision: activate: sigmoid alpha: 0.2 arch: base branch_temp: 0.07 decoder: heads: 12 hidden_size: 768 layers: 12 mask_color: mean mask_ratio: 0.6 focal_gamma: 1.0 in_chans: 2 input_size: - 192 - 192 loss_weight: 1.0 n_queries: 256 online_temp: 0.1 reduction: none residual: false unmask_weight: 0.0 warmup_branch_temp: 0.04 warmup_branch_temp_epochs: 2 schedule: decay_epochs: 30 decay_rate: 0.1 gamma: 0.1 min_lr: 2.0e-05 multisteps: [] name: cosine warmup_epochs: 100 warmup_factor: 0.01 warmup_method: linear seed: 322 stage: 2 text: bos_token_id: 1 eos_token_id: 2 hidden_act: silu hidden_size: 4096 initializer_range: 0.02 intermediate_size: 11008 max_position_embeddings: 2048 num_attention_heads: 32 num_hidden_layers: 32 pad_token_id: 0 path: /home/aiscuser/.cache/huggingface/hub/models--meta-llama--Llama-2-7b-chat-hf/snapshots/c1b0db933684edbfe29a06fa47eb19cc48025e93 rms_norm_eps: 1e-5 tie_word_embeddings: false use_cache: true vocab_size: 32000 transform: input_size: - 224 - 224 rand_aug: rand-m5-n2-mstd0.5-inc1 tune_im_patch: false tune_im_start: false tune_rgb_bk: false tune_rgb_pooler: false use_checkpoint: false wandb: false wd: 0.02 workers: 4 world_size: 1 [01/18 17:01:19 train]: Creating model [01/18 17:01:29 train]: Data Length: 1813 [01/18 17:01:29 train]: Loading pretrained checkpoint from ../../Output/LHRS/stage3/checkpoints/FINAL.pt [01/18 17:01:30 train]: Loading RGB encoder. [01/18 17:01:30 train]: After loading RGB encoder: Missing: []. Unexpected: [] [01/18 17:01:30 train]: Loadding LoRA parameters. [01/18 18:00:30 train]: result file saved to ../../Output/LHRS/stage3/zero_shot_vg/rsvg_dior/eval_save_file.json [01/18 18:00:30 train]: Accuracy: 87.09759836484416 [01/18 18:00:30 train]: Fail Sample: 0 [01/18 18:00:30 train]: Accuracy With Fail Sample: 87.09759836484416

Moreover, we also glad to provide the raw prediction result for your reference. rsvg_eval_save_file.json dior_rsvg_eval_save_file.json

Finally, all of our data and training script will be release soon~

xuliu-cyber commented 4 weeks ago

Thank you! I will check it.

xuliu-cyber commented 4 weeks ago

Hi, I find the DIOR-RSVG test json file in the https://huggingface.co/datasets/PumpkinCat/LHRS_Data/tree/main doesn't contain the whole items as the original DIOR-RSVG test datasethttps://drive.google.com/drive/folders/1hTqtYsC6B-m4ED2ewx5oKuYZV13EoJp_?usp=sharing.

pUmpKin-Co commented 4 weeks ago

Really thanks for reaching out! I will double check when I have any bandwidth.

xuliu-cyber commented 4 weeks ago

OK. I found the reproductive classification and VQA eval results are as same as the paper, except for the VG

pUmpKin-Co commented 3 weeks ago

Hi,

Sorry for the late reply.

I have checked the data and found that the issue is related to a mismatch between test.txt and the corresponding .xml annotations.

You will notice that many entries in test.txt do not have corresponding .xml annotation files. For example, the entry 0 in test.txt does not have an associated 0.xml (or any similarly named) file.

We reformatted the annotations into text format based on the correspondence between test.txt and the .xml files. As a result, we only have 3,372 test samples.

I hope this clarifies your concern.

xuliu-cyber commented 2 weeks ago

Thanks, I know where I go wrong!