The goal of this project is to explore how better feature representation and various visual cues can be utilized to improve detection quality.
Specifically, this project targets the fascinating and meaningful real world problem "pedestrian detection" as a test case. Using current state of the art pedestrian detector "SquaresChnFtrs" as a baseline, I leverage two approaches to increase detection accuracy. Expand 10 HOG+LUV channels into 20 channels by using DCT (discrete cosine transform); Encode the optical flow using SDt features (image difference between current frame T and coarsely aligned T-4 and T-8).
Note that this project is largely to reproduce observations/discovery in “Benenson etc., 2014 ECCV” paper. The DCT method is expected to yield 3.53% miss rate improvement, and the optical flow method is expected to yield 4.47% improvement.
The project started in mid-November 2014, up to now, below is achieved::
Implement the baseline + optical flow
Refer here for a complete list of issues and corresponding updates in this project.