BradyFU / Woodpecker

✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models. The first work to correct hallucinations in MLLMs.
602 stars 29 forks source link

a small bug in file "models/detector.py" #12

Open GallonDeng opened 4 months ago

GallonDeng commented 4 months ago

nice work! I did some tests and found some codes in file "models/detector.py" could cause some problem:

        global_entity_list = [] # save all the entity type name for each sentence.
        for entity_str in extracted_entities:
            # border case: nothing to extract
            if 'none' in entity_str.lower():
                continue
            entity_list = entity_str.split('.')
            for ent in entity_list:
                global_entity_dict.setdefault(ent, {}).setdefault('total_count', 0)
                global_entity_dict.setdefault(ent, {}).setdefault('crop_path', [])
                global_entity_dict.setdefault(ent, {}).setdefault('bbox', [])
            global_entity_list.append(entity_list)

when an entity is 'none', the 'global_entity_list' will not include the 'none' entity, which will result in wrong index order in 'global_entity_list' and 'sample['split_sents']' in file "models/questioner.py":

def generate_questions(self, sample: Dict):
        sentences = sample['split_sents']
        global_entity_dict = sample['entity_info']
        global_entity_list = sample['entity_list']
        qs_list = []
        num_calls = len(sentences)
        print(f'generate ques will call llm {num_calls} times')
        for ent_list, sent in zip(global_entity_list, sentences):
            exist_entity = [ent for ent in ent_list if ent in global_entity_dict and global_entity_dict[ent]['total_count'] > 0]
            # border case: no detection result for any entity. no question asked.
            if len(exist_entity)==0 :
                qs_list.append([])
                continue
            questions = get_res(self.nlp, '.'.join(exist_entity), sent)
            qs_list.append(questions)

by the way, how will the performance of VQA model affect the woodpecker performance? I changed the GPT-3.5 to llama3 and I understand the llm model will play an import role. But for the VQA model, did try other models? @xjtupanda @BradyFU

xjtupanda commented 1 month ago

Sorry for the late reply; we've been very busy lately.

  1. For the bug, could you please make a PR so we can fix that?
  2. As reported in the paper, the VQA model mainly impacts attribute recognition (color), while it does not perform well on object recognition, and that's why we introduce a detection model. We haven't tried other VQA models since BLIP-2 was already SOTA at that time.