dianzl / SODFormer

31 stars 3 forks source link

SODFormer: Streaming Object Detection with Transformers Using Events and Frames

This is the official implementation of SODFormer, a novel multimodal streaming object detector with transformers. For more details, please refer to:

SODFormer: Streaming Object Detection with Transformers Using Events and Frames
Dianze Li, Jianing Li, and Yonghong Tian, Fellow, IEEE image

Setup

This code has been tested with Python 3.9, Pytorch 1.7, CUDA 10.1 and cuDNN 7.6.3 on Ubuntu 16.04.

Modality Method Temporal cues Input representation AP$_{50}$ Runtime (ms) URL
Events SSD-events N Event image 0.221 7.2 -
Events NGA-events N Voxel grid 0.232 8.0 -
Events Deformable DETR N Event image 0.307 21.6 e_nt
Events Spatio-temporal Deformable DETR Y Event image 0.334 25.0 e_t
Frames YOLOv3 N RGB frame 0.426 7.9 -
Frames LSTM-SSD Y RGB frame 0.456 22.4 -
Frames Deformable DETR N RGB frame 0.461 21.5 f_nt
Frames Spatio-temporal Deformable DETR Y RGB frame 0.489 24.9 f_t
Events + Frames MFEPD N Event image + RGB frame 0.438 8.2 -
Events + Frames JDF N Channel image + RGB frame 0.442 8.3 -
Events + Frames SODFormer Y Voxel grid + RGB frame 0.491 41.5 -
Events + Frames SODFormer Y Event image + RGB frame 0.504 39.7 SODFormer

Demo

Low light demo

Motion blur demo

Synthetic dataset demo

Citation

Related Repos

  1. Deformable DETR: Deformable Transformers for End-to-End Object Detection
  2. TransVOD:End-to-End Video Object Detection with Spatial-Temporal Transformers