ganler / ResearchReading

General system research material (not limited to paper) reading notes.
GNU General Public License v3.0
20 stars 1 forks source link

Rekall: Specifying Video Events using Compositions of Spatiotemporal Labels #4

Closed ganler closed 4 years ago

ganler commented 4 years ago

AI System @ SOSP arXiv: https://arxiv.org/abs/1910.02993 Blog: https://dawn.cs.stanford.edu/2019/10/09/rekall/ Poster: https://dawn.cs.stanford.edu/2019/10/09/rekall/

ganler commented 4 years ago

Motivation: Hey, we already got lots of basic models to do some kind of video analysis operations, e.g., detection, and classification. Can we provide effective and efficient approaches to help users to build complex applications based on those already trained models rather than simply train a new model from scratch which is expensive? (annotation, training, GPUs are usually required)

Challenge 1: Real-World Video Analysis Requires Domain-Specific Event Detection

Basic tasks: Detecting positions, identities, obj. locations, time-aligned transcripts.

Harder tasks but are required in real-world video analysis: detecting domain-specific events.

Examples:

STD approach:

This is expensive! What if we can solve our task by leveraging a composition of basic methods? This can be a more agile video analysis workflow.

Rekall's task: by using queries that programmatically compose the outputs of existing, pre-trained models.

ganler commented 4 years ago

Programming Model

map, filter, group_by, coalesce(a sequence of labels), join(multiple concurrent label streams), minus(A happens without B).