question about evaluation

Hi! Thanks for having this amazing project! Is it possible to open-source the evaluation code? I understand the code depends on ltu

I generated a json file result with gama_inf.py like:

 {
        "audio_id": "/mnt/NVME-VM/projects/LLaVa_Mic/GAMA/data/test/acl_sk_24/filtered_audios/Y0SSy52rc1BM.wav",
        "instruction": "Deduce the possible role of the man speaking softly in the midst of music and choir. Associate the auditory analysis with the provided visuals to create a comprehensive understanding of the scene.",
        "prediction": "The man's speech, amidst music and singing, could be an announcement or commentary, possibly guiding the audience through the event or providing information about the performance or venue.",
        "timestamp_events": "['(Choir-0.0-1.932)', '(Music-0.0-10.0)', '(Hubbub, speech noise, speech babble-0.0-10.0)', '(Choir-3.092-10.0)']",
        "ref": "The man's soft speech could be a personal conversation or commentary amidst the event. In the context of the visuals, he might be an attendee discussing or commenting on the ongoing performance."
    },

I am wondering how to evaluate these results. I am still confused with all those metrics. I am a beginner in this field, and I don't quite understand the structure of ltu and what each code is evaluating. It would be really helpful even just explaining me some more details. Thank you so much!

Sreyan88 / GAMA

question about evaluation #13