BillChan226 / HALC

[ICML 2024] Official implementation for "HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding"
https://billchan226.github.io/HALC
MIT License
54 stars 1 forks source link

Hello, this is a very good job. May I ask, which specific file is HALC in? Can readme write it more clearly? #2

Open qppwdd0324 opened 3 months ago

BillChan226 commented 3 months ago

Hi,

Thanks for your interest in this project!!

Sorry if the Readme is not written clear enough, here is a snippet from Readme which describes how to run caption generation for CHAIR and POPE.

:chair: Running CHAIR evaluation for LVLMs object hallucination

Following Evaluating Object Hallucination in Large Vision-Language Models, we used "Please describe this image in detail." as the prompt to query LVLM for captions of the 500 images randomly sampled from COCO 2014 Val datast. Under root directory, run

python run_scripts/caption_generation.py --model [LVLM Backbone] --data_path [COCO_DIR] -d [Decoding Strategy] --num_samples 500 --seed [SEED] --gpu-id [GPU_IDs] --output_dir ./generated_captions/ --debugging 1

--debugging 1 will print the intermediate hallucination correction process of HALC.

:man_in_tuxedo: Running POPE evaluation for LVLMs object hallucination

Since OPOPE evaluates directly based on the caption generated for each image, it follows the caption generation procedure for CHAIR and differs in the subsequent metric calculation. To collect samples for the conventional POPE evaluation, under root directory, run

python run_scripts/pope_eval.py --model [LVLM Backbone] --data_path [COCO_DIR] -d [Decoding Strategy] --pope_type [random/popular/adversarial] --num_images 100 --seed [SEED] --gpu_id [GPU_IDs] --output_dir ./generated_captions/

You can also directly run the demo file here to test single image captioning. To run this demo, you can put the directory of the image you want to evaluate in this list, and then run

python run_scripts/demo_inference.py --model [LVLM Backbone] -d [Decoding Strategy] --seed [SEED]

We hope this would be helpful for you to run HALC and we will improve the Readme later to be more clear. If there's further questions please don't hesitate to ask:)

qppwdd0324 commented 3 months ago

Thank you for your prompt reply. May I ask what these functions in halc.py represent? Their comments are the same: "The method uses a list of context windows rooted from the DINO detection one and applies the contrasting decoding method to each context window pair to get a list of contrasting logits. Then we use the..." ![Uploading 捕获.JPG…]()

BillChan226 commented 3 months ago

Sorry, the image seems not to be uploaded successfully. These functions with the same comment are mainly different contrasting methods to contrast the various sampled FOV logits with each other. You can view this line to see how they are being used. Eventually we have used the context_layer_double_multi_contrastive_decoding function as described in our paper.