Open Stevetich opened 1 month ago
Hi, thanks for your interest in HALC! For the images searched with brutal force, we simply input it into the VLM as an individual image. You can also manually fix the retrieved optimal visual context into our framework so that you can automate the inference process. Please let me know if there's any further questions, thanks:)
Thanks for your reply! I have no more questions.
I have read your paper, it is a great work. However, I have a few questions about the details of Fig 1. experiment, which you have conducted to verify the significance of optimal visual context. What I don't understand is that after the brutal force search for the optimal visual context, how is the visual context used to generate the correct answer? Is it used as an individual image as the visual input or used in your proposed framework? Thank you for your explanation!