Sreyan88 / GAMA

Code for the paper: GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities
https://sreyan88.github.io/gamaaudio/
Apache License 2.0
80 stars 8 forks source link

question about evaluation #13

Open kayleeliyx opened 1 week ago

kayleeliyx commented 1 week ago

Hi! Thanks for having this amazing project! Is it possible to open-source the evaluation code? I understand the code depends on ltu

I generated a json file result with gama_inf.py like:

 {
        "audio_id": "/mnt/NVME-VM/projects/LLaVa_Mic/GAMA/data/test/acl_sk_24/filtered_audios/Y0SSy52rc1BM.wav",
        "instruction": "Deduce the possible role of the man speaking softly in the midst of music and choir. Associate the auditory analysis with the provided visuals to create a comprehensive understanding of the scene.",
        "prediction": "The man's speech, amidst music and singing, could be an announcement or commentary, possibly guiding the audience through the event or providing information about the performance or venue.",
        "timestamp_events": "['(Choir-0.0-1.932)', '(Music-0.0-10.0)', '(Hubbub, speech noise, speech babble-0.0-10.0)', '(Choir-3.092-10.0)']",
        "ref": "The man's soft speech could be a personal conversation or commentary amidst the event. In the context of the visuals, he might be an attendee discussing or commenting on the ongoing performance."
    },

I am wondering how to evaluate these results. I am still confused with all those metrics. I am a beginner in this field, and I don't quite understand the structure of ltu and what each code is evaluating. It would be really helpful even just explaining me some more details. Thank you so much!

Sreyan88 commented 1 day ago

Hi @kayleeliyx ,

Looks like you are trying to evaluate responses on GAMA-IT. @sonalkum should be able to point you to the evaluation script which uses GPT to evaluate.