Nicous20 / FunQA

FunQA benchmarks funny, creative, and magic videos for challenging tasks including timestamp localization, video description, reasoning, and beyond.
https://funqa-benchmark.github.io/
MIT License
96 stars 0 forks source link

Bug on answer file validation check in `classic_eval.py` #11

Open SCZwangxiao opened 1 year ago

SCZwangxiao commented 1 year ago
chk_answer = []
    for data in answer:
        chk_answer.append({
            'task': data['task'],
            'output': data['output'],
            'instruction': data['instruction'],
            'ID': data['ID']
        })

    diff = False
    for data in submission:
        if {
                'task': data['task'],
                'output': data['output'],
                'instruction': data['instruction'],
                'ID': data['ID']
        } not in chk_answer:
            diff = True
            break

    assert diff == False, 'Submission file is not valid'
    print('File is valid! Loading File...')

The 'output' value of submission and gt answer is probably different. So there will definitely be an AssertionError.