gabriben / videopipe-viz

0 stars 1 forks source link

videopipe-viz

Explainability in Multimodal Video AI through visualisation

At RTL, we have a video AI pipeline (face detection, credit detection, subtitling, thumbnail selection, image aesthetics, …). The results come out as json files that can be read by professional video editing software. We would also like to demo our video AI pipeline to non-video editors who do not / cannot use a video editing software. To that end, we would like to smartly visualize results from the AI pipeline in the .mp4 video itself with moviepy and ffmpeg. To realise this project, we will give you access to our cloud platform and data. We also will help you conduct surveys with our video editors and stakeholders to find out whether burned-in AI in videos actively increases trust in AI solutions.

requirements

some inspiration

possible tools

overview of JSON output

The different JSON output files may need different visualization approaches. In the table below you can see the characteristics of each JSON and possible visualization solutions.

JSON output Single frame Spans multiple frames Thresholding SRT-fileable GIFable
Face detection ✔ (for dense data) ✔ (for sparse data)
Subtitles
Shot boundaries
Text detection ✔ (for dense data) ✔ (for sparse data)
Image Aesthetics ?
Language identification
Midroll marker ?
Speech Gap
Still picker
Voice activity
Speech Recognition

references

AI-Based Visualization of Voice Characteristics in Lecture Videos’ Captions
Tim Schlippe, Katrin Fritsche, Ying Sun, Matthias Wölfel
International Conference on Artificial Intelligence in Education Technology 2023 – [paper]

A survey of surveys on the use of visualization for interpreting machine learning models
Angelos Chatzimparmpas, Rafael M. Martins, Ilir Jusufi, Andreas Kerren
Information visualization 2020 – [paper]

Dynamic Object Scanning: Object-Based Elastic Timeline for Quickly Browsing First-Person Videos
Seita Kayukawa, Keita Higuchi, Ryo Yonetani, Masanori Nakamura, Yoichi Sato, Shigeo Morishima
CHI 2018 (extended abstract) – [paper]