Open maxwZJU opened 2 weeks ago
Hi @maxwZJU, that is an interesting question. The reasons might be many, but it might be attributed to kind of tasks GAMA has been trained for and level of reasoning ability it has. I would also suggest to try with GAMA-IT as it has better reasoning abilities. I tried it and GAMA-IT is able to answer such question.
Hi! Thank you for your excellent work! When I inference GAMA, I encountered the same question with LTU: I load an audio from the eval set of AudioSet, in which a man is speaking. When I ask "Describe the audio.", GAMA returns the precise answer "Audio caption: A man is speaking and beeping his car keys as he gets out of his car and walks away to open something.". However, when I ask "Determine the gender of the speaker." and "Who's speaking? A man or a woman?", GAMA returns "The gender of the speaker is not specified." and "It is not specified in the audio clip who is speaking. It could be either a man or a woman.". May I ask what caused this strange situation?