SpectData / MONAH

Multi-Modal Narratives for Humans
MIT License
3 stars 0 forks source link

MONAH: Multi-Modal Narratives for Humans

Problem

Analyzing videos in video format (visual + audio + text) needs a lot of human expertise and end-to-end deep learning methods are less interpretable.

Solution

Inspired by how the linguistics community analyze conversations using the Jefferson transcription system. MONAH creates a multi-modal text narrative for dyadic (two-people) video-recorded conversations by weaving what is being said with how its being said.

ScreenCast

To add later

Required Inputs

Two videos, one for each speaker. Works best when the camera is in front of the speaker, instead of from an angle. Verbatim Transcript from YouTube.

User Interface

Text menu based for easy configuration.

alt text

Support modalities in the narratives

alt text

Output - MONAH Narrative

To add later

Dependencies (Technology Stack)

To add as we build this repo up.

Fine Narratives

Actions

Prosody

Coarse Narratives

Demographics

Semantics

Mimicry

Contributions

MOANH is meant to be a modular system that allows for additions to be simple. Joshua to add architectural diagram.

Pipeline (Intermediate Artifacts)

To add later

Continuous Integration

Joshua to add PyLint Python Style Tests Joshua to add Compulsory Unit Tests

Citation

If you find MONAH useful in any of your publications we ask you to cite the following:

Features introduced in Paper 1 are in white, features introduced in paper 2 are in blue.

alt text