The third stage, which is the validation stage, involves separating out the language that may contain hallucinations from the first two stages. It utilizes a multimodal large model(lava, ofa ) to determine the presence of hallucinations. I have the following questions:
If the faithscore input in run.py is the answer.jsonl generated using the fine-tuned LLaVA, can LLaVA still be used as the model to check for hallucinations in stage 3?
If stage 3 uses a multimodal large model and the input answer.jsonl is generated by LLaVA, does it appear as if LLaVA is detecting hallucinations produced by LLaVA? Is this a valid approach for detection?
I apologize if I have not fully understood the paper. Please clarify, thanks.
The third stage, which is the validation stage, involves separating out the language that may contain hallucinations from the first two stages. It utilizes a multimodal large model(lava, ofa ) to determine the presence of hallucinations. I have the following questions:
If the
faithscore
input inrun.py
is the answer.jsonl generated using the fine-tuned LLaVA, can LLaVA still be used as the model to check for hallucinations in stage 3?If stage 3 uses a multimodal large model and the input answer.jsonl is generated by LLaVA, does it appear as if LLaVA is detecting hallucinations produced by LLaVA? Is this a valid approach for detection?
I apologize if I have not fully understood the paper. Please clarify, thanks.