Rekall: Specifying Video Events using Compositions of Spatiotemporal Labels

ganler commented 4 years ago

AI System @ SOSP arXiv: https://arxiv.org/abs/1910.02993 Blog: https://dawn.cs.stanford.edu/2019/10/09/rekall/ Poster: https://dawn.cs.stanford.edu/2019/10/09/rekall/

ganler commented 4 years ago

Motivation: Hey, we already got lots of basic models to do some kind of video analysis operations, e.g., detection, and classification. Can we provide effective and efficient approaches to help users to build complex applications based on those already trained models rather than simply train a new model from scratch which is expensive? (annotation, training, GPUs are usually required)

Challenge 1: Real-World Video Analysis Requires Domain-Specific Event Detection

Basic tasks: Detecting positions, identities, obj. locations, time-aligned transcripts.

Harder tasks but are required in real-world video analysis: detecting domain-specific events.

Examples:

Query action segments.
Auto driving: traffic light change.

STD approach:

Data collection & annotation
Model arch design
Training

This is expensive! What if we can solve our task by leveraging a composition of basic methods? This can be a more agile video analysis workflow.

Rekall's task: by using queries that programmatically compose the outputs of existing, pre-trained models.

ganler commented 4 years ago

Programming Model

map, filter, group_by, coalesce(a sequence of labels), join(multiple concurrent label streams), minus(A happens without B).

ganler / ResearchReading

Rekall: Specifying Video Events using Compositions of Spatiotemporal Labels #4

Challenge 1: Real-World Video Analysis Requires Domain-Specific Event Detection

Programming Model