Scoring and timing of the storing groceries task

nohernan commented 6 years ago

Hello,

I noticed the change in the score for the Storing Groceries task, which I reckon it's a bit drastic. You wiped out all scoring for recognition and labeling within the pdf report. I understand the idea of taking recognition, grasping and placing as a whole but wouldn't be fair to acknowledge the recognition and labeling of objects with some points?

Also, within the first two minutes the robot has to find and inspect the cupboard, find and inspect the table, grasp at least one object from the table, go to the cupboard and place such an object near those of the same type or category. Could you allow more than two minutes for all these actions, please? Three minutes to place the first object would be fine.

Cheers, Noé H

kyordhel commented 6 years ago

2017's performance made clear that robots have no major problems recognizing objects. However, from the audience point of view, it looked more like a procession of robots to contemplate altar of objects. This gave a nefarious impact on SATTU , and on the impression about what's going on in @Home. In consequence, the EC and TC decided to change the test respecting its essence: storing groceries, addressing #205, #358, and #353. Regarding time, in GPSR the average manipulation time is under one minute, so 2 should suffice.

@airglow I hope you can elaborate further.

nohernan commented 6 years ago

As I recall, 2017's performance in the Storing Groceries task was not so good, and in general object recognition wasn't accomplished well by most teams.

Look at the scores of last year's robocup@home for OPL, and compare the points obtained in all Stage 1 tasks (https://github.com/RoboCupAtHome/Nagoya2017/tree/master/Scores), then you'll notice that Storing Groceries got the lowest scores, what can be done to get a more balanced and appropriate score for Storing Groceries? I don't think current score is helping, rather it is discouraging good performance on this task. Most teams will prefer to focus on other tasks of Stage 1, as we saw last year. I suggest giving points to recognition and grasping of objects, but other ways of getting teams more involved in this task may produce better results.

tkelestemur commented 6 years ago

@nohernan If you look at the scoring of Storing Groceries task, you'll see that you get way more points when you place objects next to their class. That way object recognition is scored. If you can think about the overall competition, object recognition is scored in different tasks and Storing Groceries is trying to advance object manipulation. So I think the scoring is fare.

On the other hand, object detection is somewhat solved problem. If you have a state-of-the-art object detection library and a good GPU, you can easily detect objects whereas object manipulation in domestic settings is still an open research area.

justinhart commented 6 years ago

Eh, I wouldn't consider object detection/recognition to still be somewhat open. Yolo certainly won't be the last word in object recognition.

LoyVanBeek commented 6 years ago

Object recognition is in no way a done deal. RoboCup@Home makes this worse in some sense, by having objects that are not in any training set and having very specific labels. YOLO might have 1 category for e.g. noodles but not for very similar cups of noodles that are different. This is exacerbated by the limited time to gather data and training. Humans do this fine, computer not yet, so there is plenty of progress to be made.

The point of a robotics competition is not to just recognize objects (that is only part of the problem). My phone can recognize objects with the right apps. My phone will never be able to put my groceries in the right place, for that you need a robot with a manipulator.

kyordhel commented 6 years ago

I really didn't wanted to get into this topic, but here I go.

Scoring

2017 Storing groceries can be divided in 3 aspects:

Manipulation (38%)
- Door opening: 20pts
- Moving objects: 100pts
Object Recognition (16%)
- Recognition of known and alike objects: 50pts
- False positives: -50pts
Object Categorization (46%)
- Feature-based unknown object labelling: 150pts
- XOR pairing of unknown objects: 100pts
- XOR marking unknown objects: as unknown 50pts

From 2014-2016 we know that teams are doing good in object recognition when it comes to known objects (correctly label an instance of a pre-trained model), and not that good for alike ones (correctly label an instance not matching but sharing features of a pre-trained model). So when I say robots have no major problems recognizing objects, I really mean it. I am excluding, of course, partial occlusions, stacking, and areas with irregular lighting conditions. These are, of course, problems far from being solved from a mathematical, engineering, and even philosophical (ontological) perspective.

Knowing that no robot tried to open the cupboard door, and that most of them just stared at the objects, is easy to infer that scores in [1] and [2] are mostly for object recognition. Considering the penalty for false positives, is of no surprise that most scores range under 50 with none going higher, which is congruent with [3].

The problem

I may be no expert in object recognition, but it is clear a robot can't formally recognize objects it has never seen before. This is a frontier research problem. Categorizing unknown objects requires finding features within the object that may lead to semantic representations on what the object may be. We are talking about the probability of belonging to several from a group of not necessarily disjoint sets that may also not be related. I think this problem is A.I. complete and requires some deep research. We are all scientists facing real-world problems in a realistic scenario after all. @Home is not glueing trending apps with ros into a laptop to control some hardware tied with duct tape and bubblegum.

Storing groceries is all about this problem: how to classify objects you have never ever seen before. This happens to me all the time. I use to buy a 1kg bag of rice in Aldi, but this time I went with the Russians and they have totally different brands, so I pick a rice I've never seen before. Yet, I had no problems to pick my box of rice and place it with other cereals, even though the brand is unknown to me, this is a box and not a bag, and I can't read Cyrillic.

The test

As @LoyVanBeek already pointed out, in @Home we want to see robots in action. The spirit of the test is what its name clearly says: Storing Groceries, meaning placing similar objects together. Nothing else.

In my opinion, the new scoring balances much more the reward and weights more object recognition. You have the chance to move 3 known objects (i.e., pure object recognition) for a grand total of 90pts if you are fast enough. A little bit more challenging are another two objects, and you can choose the alike ones which are the second easiest to achieve 110pts more (or 20 for wrong placements if your robot suck at it). Splitting in halves manipulation and object recognition (categorization of instances of an previously learned pattern is trivial after recognition) you can score 100pts out of 200pts for object recognition, twice as much as in 2017! No dealing with unknowns if you don't want, and we are requesting nothing YOLO can't do.

Basically, we removed the requirement of edge-research in semantic object recognition, reducing the test to present a decent coordination of YOLO + MoveIt.

Conclusion

We doubled the scoring for object recognition compared with 2017. From all perspectives this is lowering the difficulty of the test by a lot. We can conclude that, just because there are no explicit points given to an isolated ability, one can't possibly say performance is not acknowledged or rewarded.

kyordhel commented 6 years ago

@RoboCupAtHome/technical-commmittee, I think there is consensus in that no changes in the scoring will be made in this regard.

Can we close this issue?

justinhart commented 6 years ago

I agree with closing it.

LoyVanBeek commented 6 years ago

It is a drastic change and at some point, this test was set up as a way to test and score object recognition and manipulation separately but in one go. There is always some tension between benchmarking&separation vs. tasks&integration and sometimes the rules go back and forth between this over some years. I could imagine to score simply eg. extra points for manipulating a recognized object or just a bit less for manipulating just any object.

RoboCupAtHome / RuleBook