Analyzing videos in video format (visual + audio + text) needs a lot of human expertise and end-to-end deep learning methods are less interpretable.
Inspired by how the linguistics community analyze conversations using the Jefferson transcription system. MONAH creates a multi-modal text narrative for dyadic (two-people) video-recorded conversations by weaving what is being said with how its being said.
To add later
Two videos, one for each speaker. Works best when the camera is in front of the speaker, instead of from an angle. Verbatim Transcript from YouTube.
Text menu based for easy configuration.
To add later
To add as we build this repo up.
Actions
Prosody
Demographics
Semantics
Mimicry
MOANH is meant to be a modular system that allows for additions to be simple. Joshua to add architectural diagram.
To add later
Joshua to add PyLint Python Style Tests Joshua to add Compulsory Unit Tests
If you find MONAH useful in any of your publications we ask you to cite the following:
Features introduced in Paper 1 are in white, features introduced in paper 2 are in blue.