chuyq / MESC

8 stars 0 forks source link

About the raw dataset #2

Open sjghh opened 2 hours ago

sjghh commented 2 hours ago

Would it be possible for you to consider open-sourcing the raw dataset. I believe this would greatly benefit the research community and further enhance the impact of your work.

chuyq commented 2 hours ago

For copyright reasons, we publicly share the annotated labels and the corresponding video timestamps.

sjghh commented 40 minutes ago

I have a question regarding the statement in the paper: 'To enhance labor efficiency and reduce costs, we employ a large model like GPT-3.5 for coarse-grained annotation, followed by manual fine-grained calibration. The overall annotation accuracy of GPT-3.5 is about 25%, which is low.' However, GPT-3.5 does not have the capability to handle multimodal data. What kind of data was processed in this case? Did I misunderstand something, or am I missing some details?