stage3 question - Githubissues

The third stage, which is the validation stage, involves separating out the language that may contain hallucinations from the first two stages. It utilizes a multimodal large model(lava, ofa ) to determine the presence of hallucinations. I have the following questions:

If the faithscore input in run.py is the answer.jsonl generated using the fine-tuned LLaVA, can LLaVA still be used as the model to check for hallucinations in stage 3?
If stage 3 uses a multimodal large model and the input answer.jsonl is generated by LLaVA, does it appear as if LLaVA is detecting hallucinations produced by LLaVA? Is this a valid approach for detection?

I apologize if I have not fully understood the paper. Please clarify, thanks.

bcdnlp / FAITHSCORE

stage3 question #3