Computer Vision - Githubissues

Summary of high-level computer vision tasks

Task 1 Getting to know my home: Detecting door (state as well), detecting furniture, detecting objects (known and unknown). One end goal that is vision-oriented is detecting the unknown object (trash).
Task 2 Welcoming visitors: Detecting people (known and unknown) and reacting to each person in different ways. In particular, detecting uniform for the postman, shopping bag for the deli man, and toolbox for the plumber.
Task 3 Catering for Granny Annie's Comfort: Object detection, localization and detecting of one person.
Task 4 Visit my home: detect and avoid obstacles, interact with the obstacles based on the obstacle, interact and follow a person. (Objects: kraft paper shopping bag needs to be removed, wheeled object needs to be gently pushed away, unknown human obstacle needs to be asked to move)

Functional benchmarks:

Object perception functionality: list of individual objects subdivided into classes. Need to detect their presence and estimate their class, instance, and location. For example, when presented with a bottle of milk, the robot should detect a bottle (class) of milk (instance) and estimate its pose w.r.t. a known reference frame. Three main goals: class recognition, instance recognition, pose estimation.
People perception functionality: The benchmark requires that the robots correctly detects the presence of a human inside a predefined target area, recognizes the person and accurately estimates his/her position. Two main goals: person recognition and person localization.
People following functionality: Accompany a person and always maintain a specific distance from them. Follow the person while avoiding obstacles.

Breakdown in tasks for computer vision

Broken down all the tasks into 2 main groups: Object detection using YOLO (hopefully) and person detection-following using. For object detection, using 2D images directly is a lot easier to do. But since we are not given images of the objects to look at, I'm using datasets such as PASCAL VOC and MS COCO.

Using PASCAL VOC, YOLO would be able to detect the following: person, animals (bird, cat, cow, dog, horse, sheep), vehicles (aeroplane, bicycle, boat, bus, car, motorbike, train) and common objects (bottle, chair, dining table, potted plant, sofa, tv/monitor).

Using MS COCO, YOLO would be able to detect 80 different objects. See this for more details.

We could directly then use YOLO to detect a person for functional benchmarks 2 and 3. Once the person is detected, we then follow the person.

LiamWellacott / CDT2019-ERL

Computer Vision #51

Summary of high-level computer vision tasks

Functional benchmarks:

Breakdown in tasks for computer vision