Closed xuliu-cyber closed 4 months ago
Hi~ Thanks for you interest.
We just test over on the test set by converting the test set into instruction format. The test set can be found at here.
Below the result obtained from one evaluation run.
[32m[01/18 16:38:23 train]: [0mFull config saved to ../../Output/LHRS/stage3/zero_shot_vg/rsvg/config.json [32m[01/18 16:38:23 train]: [0maccelerator: gpu adjust_norm: false alignment_dim: 768 batch_size: 1 bf16: true bits: 16 config: null data_path: /home/aiscuser/pumpkin_dataset/InstructDataset/RSVG_Image data_target: /home/aiscuser/pumpkin_dataset/Eval/VGEvalDataset/RSVG_test.json double_quant: true dtype: float16 enable_amp: true entity: pumpkinn epochs: 2 eval: dataset: AID fp16: false generate: false gpus: 0 inf_sampler: false is_distribute: false local_rank: 0 lora: enable: false lora_alpha: 16 lora_bias: none lora_dropout: 0.05 lora_r: 8 lr: 0.0003 max_grad_norm: 1.0 model_path: ../../Output/LHRS/stage3/checkpoints/FINAL.pt optimizer: adanp opts: null output: ../../Output/LHRS/stage3/zero_shot_vg/rsvg project: MaskIndexNet prompt_template: llava_llama_2 quant_type: nf4 rank: 0 rgb_vision: arch: vit_large attn_pooler: num_attn_heads: 16 num_layers: 6 num_query: 144 input_patchnorm: false input_size: - 224 - 224 patch_dropout: 0.0 tune_pooler: true vit_name: openai/clip-vit-large-patch14 sar_vision: activate: sigmoid alpha: 0.2 arch: base branch_temp: 0.07 decoder: heads: 12 hidden_size: 768 layers: 12 mask_color: mean mask_ratio: 0.6 focal_gamma: 1.0 in_chans: 2 input_size: - 192 - 192 loss_weight: 1.0 n_queries: 256 online_temp: 0.1 reduction: none residual: false unmask_weight: 0.0 warmup_branch_temp: 0.04 warmup_branch_temp_epochs: 2 schedule: decay_epochs: 30 decay_rate: 0.1 gamma: 0.1 min_lr: 2.0e-05 multisteps: [] name: cosine warmup_epochs: 100 warmup_factor: 0.01 warmup_method: linear seed: 322 stage: 2 text: bos_token_id: 1 eos_token_id: 2 hidden_act: silu hidden_size: 4096 initializer_range: 0.02 intermediate_size: 11008 max_position_embeddings: 2048 num_attention_heads: 32 num_hidden_layers: 32 pad_token_id: 0 path: /home/aiscuser/.cache/huggingface/hub/models--meta-llama--Llama-2-7b-chat-hf/snapshots/c1b0db933684edbfe29a06fa47eb19cc48025e93 rms_norm_eps: 1e-5 tie_word_embeddings: false use_cache: true vocab_size: 32000 transform: input_size: - 224 - 224 rand_aug: rand-m5-n2-mstd0.5-inc1 tune_im_patch: false tune_im_start: false tune_rgb_bk: false tune_rgb_pooler: false use_checkpoint: false wandb: false wd: 0.02 workers: 4 world_size: 1 [32m[01/18 16:38:23 train]: [0mCreating model [32m[01/18 16:38:33 train]: [0mData Length: 1227 [32m[01/18 16:38:33 train]: [0mLoading pretrained checkpoint from ../../Output/LHRS/stage3/checkpoints/FINAL.pt [32m[01/18 16:38:34 train]: [0mLoading RGB encoder. [32m[01/18 16:38:35 train]: [0mAfter loading RGB encoder: Missing: []. Unexpected: [] [32m[01/18 16:38:35 train]: [0mLoadding LoRA parameters. [32m[01/18 17:00:58 train]: [0mresult file saved to ../../Output/LHRS/stage3/zero_shot_vg/rsvg/eval_save_file.json [32m[01/18 17:00:58 train]: [0mAccuracy: 71.94851330203444 [32m[01/18 17:00:58 train]: [0mFail Sample: 0 [32m[01/18 17:00:58 train]: [0mAccuracy With Fail Sample: 71.94851330203444
[32m[01/18 17:01:19 train]: [0mFull config saved to ../../Output/LHRS/stage3/zero_shot_vg/rsvg_dior/config.json [32m[01/18 17:01:19 train]: [0maccelerator: gpu adjust_norm: false alignment_dim: 768 batch_size: 1 bf16: true bits: 16 config: null data_path: /home/aiscuser/pumpkin_dataset/InstructDataset/RSVG_DIOR_Image data_target: /home/aiscuser/pumpkin_dataset/Eval/VGEvalDataset/RSVG_DIOR_test.json double_quant: true dtype: float16 enable_amp: true entity: pumpkinn epochs: 2 eval: dataset: AID fp16: false generate: false gpus: 0 inf_sampler: false is_distribute: false local_rank: 0 lora: enable: false lora_alpha: 16 lora_bias: none lora_dropout: 0.05 lora_r: 8 lr: 0.0003 max_grad_norm: 1.0 model_path: ../../Output/LHRS/stage3/checkpoints/FINAL.pt optimizer: adanp opts: null output: ../../Output/LHRS/stage3/zero_shot_vg/rsvg_dior project: MaskIndexNet prompt_template: llava_llama_2 quant_type: nf4 rank: 0 rgb_vision: arch: vit_large attn_pooler: num_attn_heads: 16 num_layers: 6 num_query: 144 input_patchnorm: false input_size: - 224 - 224 patch_dropout: 0.0 tune_pooler: true vit_name: openai/clip-vit-large-patch14 sar_vision: activate: sigmoid alpha: 0.2 arch: base branch_temp: 0.07 decoder: heads: 12 hidden_size: 768 layers: 12 mask_color: mean mask_ratio: 0.6 focal_gamma: 1.0 in_chans: 2 input_size: - 192 - 192 loss_weight: 1.0 n_queries: 256 online_temp: 0.1 reduction: none residual: false unmask_weight: 0.0 warmup_branch_temp: 0.04 warmup_branch_temp_epochs: 2 schedule: decay_epochs: 30 decay_rate: 0.1 gamma: 0.1 min_lr: 2.0e-05 multisteps: [] name: cosine warmup_epochs: 100 warmup_factor: 0.01 warmup_method: linear seed: 322 stage: 2 text: bos_token_id: 1 eos_token_id: 2 hidden_act: silu hidden_size: 4096 initializer_range: 0.02 intermediate_size: 11008 max_position_embeddings: 2048 num_attention_heads: 32 num_hidden_layers: 32 pad_token_id: 0 path: /home/aiscuser/.cache/huggingface/hub/models--meta-llama--Llama-2-7b-chat-hf/snapshots/c1b0db933684edbfe29a06fa47eb19cc48025e93 rms_norm_eps: 1e-5 tie_word_embeddings: false use_cache: true vocab_size: 32000 transform: input_size: - 224 - 224 rand_aug: rand-m5-n2-mstd0.5-inc1 tune_im_patch: false tune_im_start: false tune_rgb_bk: false tune_rgb_pooler: false use_checkpoint: false wandb: false wd: 0.02 workers: 4 world_size: 1 [32m[01/18 17:01:19 train]: [0mCreating model [32m[01/18 17:01:29 train]: [0mData Length: 1813 [32m[01/18 17:01:29 train]: [0mLoading pretrained checkpoint from ../../Output/LHRS/stage3/checkpoints/FINAL.pt [32m[01/18 17:01:30 train]: [0mLoading RGB encoder. [32m[01/18 17:01:30 train]: [0mAfter loading RGB encoder: Missing: []. Unexpected: [] [32m[01/18 17:01:30 train]: [0mLoadding LoRA parameters. [32m[01/18 18:00:30 train]: [0mresult file saved to ../../Output/LHRS/stage3/zero_shot_vg/rsvg_dior/eval_save_file.json [32m[01/18 18:00:30 train]: [0mAccuracy: 87.09759836484416 [32m[01/18 18:00:30 train]: [0mFail Sample: 0 [32m[01/18 18:00:30 train]: [0mAccuracy With Fail Sample: 87.09759836484416
Moreover, we also glad to provide the raw prediction result for your reference. rsvg_eval_save_file.json dior_rsvg_eval_save_file.json
Finally, all of our data and training script will be release soon~
Thank you! I will check it.
Hi, I find the DIOR-RSVG test json file in the https://huggingface.co/datasets/PumpkinCat/LHRS_Data/tree/main doesn't contain the whole items as the original DIOR-RSVG test datasethttps://drive.google.com/drive/folders/1hTqtYsC6B-m4ED2ewx5oKuYZV13EoJp_?usp=sharing.
Really thanks for reaching out! I will double check when I have any bandwidth.
OK. I found the reproductive classification and VQA eval results are as same as the paper, except for the VG
Hi,
Sorry for the late reply.
I have checked the data and found that the issue is related to a mismatch between test.txt
and the corresponding .xml
annotations.
You will notice that many entries in test.txt
do not have corresponding .xml
annotation files. For example, the entry 0 in test.txt
does not have an associated 0.xml
(or any similarly named) file.
We reformatted the annotations into text format based on the correspondence between test.txt
and the .xml
files. As a result, we only have 3,372 test samples.
I hope this clarifies your concern.
Thanks, I know where I go wrong!
Hi xuliu, did you successfully reproduce the accuracy of the visual grounding task from the paper?
Hi xuliu, did you successfully reproduce the accuracy of the visual grounding task from the paper?
I have known where i did wrong. I didn't unzip the TextLoRA.zip at the first time.
I Use the stage3 checkpoint for evaluation. But I got the visual grounding results different with the paper mentioned. Here are the log outputs I got with the accuracy of 28.75:
[32m[06/27 10:52:52 train]: [0mFull config saved to eval/vg/DIOR-RSVG/config.json [32m[06/27 10:52:52 train]: [0maccelerator: gpu adjust_norm: false alignment_dim: 768 batch_size: 1 bf16: true bits: 16 config: null data_path: /data/liux/DIOR-RSVG/JPEGImages data_target: /data/liux/DIOR-RSVG/test.json double_quant: true dtype: float16 enable_amp: true entity: pumpkinn epochs: 2 eval: dataset: AID fp16: false generate: false gpus: 0 inf_sampler: false is_distribute: false local_rank: 0 lora: enable: false lora_alpha: 256 lora_bias: none lora_dropout: 0.05 lora_r: 128 lr: 0.0002 max_grad_norm: 0.3 model_path: /data/liux/LHRS/Stage3/FINAL.pt optimizer: adanp opts: null output: eval/vg/DIOR-RSVG project: MaskIndexNet prompt_template: llava_llama_2 quant_type: nf4 rank: 0 rgb_vision: arch: vit_large attn_pooler: num_attn_heads: 16 num_layers: 6 num_query: 144 input_patchnorm: false input_size:
[32m[06/27 10:52:52 train]: [0mCreating model [32m[06/27 10:53:56 train]: [0mData Length: 7500 [32m[06/27 10:53:56 train]: [0mLoading pretrained checkpoint from /data/liux/LHRS/Stage3/FINAL.pt [32m[06/27 10:53:57 train]: [0mLoading RGB encoder. [32m[06/27 10:53:57 train]: [0mAfter loading RGB encoder: Missing: []. Unexpected: [] [32m[06/27 10:53:57 train]: [0mLoadding LoRA parameters. [32m[06/27 12:06:29 train]: [0mresult file saved to eval/vg/DIOR-RSVG/eval_save_file.json [32m[06/27 12:06:29 train]: [0mAccuracy: 28.75025651549354 [32m[06/27 12:06:29 train]: [0mFail Sample: 1229 [32m[06/27 12:06:29 train]: [0mAccuracy With Fail Sample: 22.959685349065882