videopipe-viz

Explainability in Multimodal Video AI through visualisation

At RTL, we have a video AI pipeline (face detection, credit detection, subtitling, thumbnail selection, image aesthetics, …). The results come out as json files that can be read by professional video editing software. We would also like to demo our video AI pipeline to non-video editors who do not / cannot use a video editing software. To that end, we would like to smartly visualize results from the AI pipeline in the .mp4 video itself with moviepy and ffmpeg. To realise this project, we will give you access to our cloud platform and data. We also will help you conduct surveys with our video editors and stakeholders to find out whether burned-in AI in videos actively increases trust in AI solutions.

requirements

takes in an .mp4 and a .json as input
this doesn't need to be blazing fast: for now the idea is to create visualizations after a video is processed, not at streaming time
[optional] modular: different outputs of videopipe can be shown to the user on the same video
[optional] conduct a human evaluation on the explainability aspect

some inspiration

mediapipe
Google's Cloud Video Intelligence, here and here
annotation platforms. For example Labelbox, label studio (open source!) or Diffgram (open source!)

possible tools

overview of JSON output

The different JSON output files may need different visualization approaches. In the table below you can see the characteristics of each JSON and possible visualization solutions.

JSON output	Single frame	Spans multiple frames	Thresholding	SRT-fileable	GIFable
Face detection	✔	✔ (for dense data)	✔		✔ (for sparse data)
Subtitles		✔		✔
Shot boundaries	✔				✔
Text detection	✔	✔ (for dense data)			✔ (for sparse data)
Image Aesthetics	✔			?	✔
Language identification		✔		✔
Midroll marker	✔				?
Speech Gap		✔		✔
Still picker	✔				✔
Voice activity		✔		✔
Speech Recognition		✔		✔

references

AI-Based Visualization of Voice Characteristics in Lecture Videos’ Captions
Tim Schlippe, Katrin Fritsche, Ying Sun, Matthias Wölfel
International Conference on Artificial Intelligence in Education Technology 2023 – [paper]

A survey of surveys on the use of visualization for interpreting machine learning models
Angelos Chatzimparmpas, Rafael M. Martins, Ilir Jusufi, Andreas Kerren
Information visualization 2020 – [paper]

Dynamic Object Scanning: Object-Based Elastic Timeline for Quickly Browsing First-Person Videos
Seita Kayukawa, Keita Higuchi, Ryo Yonetani, Masanori Nakamura, Yoichi Sato, Shigeo Morishima
CHI 2018 (extended abstract) – [paper]