NickyFot / EmoCLIP

Apache License 2.0
39 stars 3 forks source link

How did you generate the sample-level description (the ones in Figure 1)? #1

Closed JethroJames closed 10 months ago

JethroJames commented 10 months ago

Hello,

I hope you're doing well. I recently came across your work and I find it truly intriguing. I was wondering how it differentiates from the research presented at BMVC 2023, which can be found here.

Specifically, I'm curious about the sample-level description generation technique you've introduced. The paper doesn't seem to provide an intuitive explanation regarding this. Is it possible for GPT to generate a coherent description directly from a video segment? Could you shed some light on how the sample-level description was generated? I'd greatly appreciate any insights you can share.

Thank you in advance for your time and clarification!

Best regards

NickyFot commented 10 months ago

Hi,

Thanks for taking the time to read the paper! The sample-level descriptions in our work are not generated, we directly use the ones provided by the MAFW dataset, which I believe were obtained by human annotators: https://github.com/MAFW-database/MAFW

Thanks

Niki