Jack-ZC8 / M3AV-dataset

A Multimodal, Multigenre, and Multipurpose Audio-Visual Academic Lecture Dataset (ACL 2024)
https://jack-zc8.github.io/M3AV-dataset-page
Other
13 stars 2 forks source link

🎓M3AV: A Multimodal, Multigenre, and Multipurpose Audio-Visual Academic Lecture Dataset (ACL 2024)

[**📄Paper**](https://arxiv.org/abs/2403.14168) | [**🏠️Homepage**](https://jack-zc8.github.io/M3AV-dataset-page/) | [**📥Download**](./download/) | [**💎Demo**](./demo/) | [**🤖Benchmarks**](./benchmarks/)

Overview

The overview of our 🎓M3AV dataset:

  1. The first component is slides annotated with simple and complex blocks. They will be merged following some rules.
  2. The second component is speech containing special vocabulary, spoken and written forms, and word-level timestamps.
  3. The third component is the paper corresponding to the video. The asterisk (*) denotes that only computer science videos have corresponding papers.
Video Field Video Name Total Hours Total Counts
Human-computer Interaction CHI 2021 Paper Presentations (CHI) 55.00 660
Human-computer Interaction UbiComp 2020 Presentations (Ubi) 10.65 107
Biomedical Sciences NIH Director's Wednesday Afternoon Lectures (NIH) 237.71 228
Biomedical Sciences Introduction to the Principles and Practice of Clinical Research (IPP) 42.27 67
Mathematics Oxford Mathematics (MLS) 27.23 51

We download various academic lectures ranging from Human-computer Interaction, and Biomedical Sciences to Mathematics as shown in the table above.

News

Details

The folder demo contains a sample for demonstration.