Relative weights of the different asepcts in the tests

esucar commented 6 years ago

I consider that the points assigned to each aspect of the tests should be revised considering the levels of complexity of each aspect. For instance, recognizing a question in Speech and Person Recognition gives 10 points, and grasping an object in String Groceries gives the same 10 points; grasping an object is much more complex than undersanting a question! The relative weights for each aspect of each test is important, as the results of all the tests are combinedd to give the global score.

johaq commented 6 years ago

While grasping an object is more complex I would argue that "simpler" stuff like basic dialogue and navigation is equally as important for a service robot. Grasping does not do me much if I cant tell the robot what to grasp or it cannot get to the object. So personally I am fine with the scoring there. Also there are a lot of aspects in different tests where the TC decided that these are areas or abilities they would like to see developed (opening a cupboard door in storing groceries for example) that give a lot of points. Ultimately scoring is of course a huge topic and there have been some discussions on new ways to score see #358.

kyordhel commented 6 years ago

Several decisions were made based on team's performance in 2017, previous years, and the standard platform leagues. Scoring in Speech and Person Recognition now considers performing certain amount of inference and even an elemental degree of environmental reasoning (all non-predefined questions).

To some teams, the use of a mid-end manipulator (or a standard platform one) along with Move-It! and YOLO will turn manipulating objects in a rather trivial task, while guessing which colour the girl in the picture was wearing can be quite difficult. The opposite might be true for teams with an strong AI background, a custom manipulator or little to no experience in control.

For this reason, the TC opted for leaving manipulation as easy as possible, while rewarding speed and higher-level cognitive functions like feature-based organization, without stressing the integration aspect of service robotics.

LoyVanBeek commented 6 years ago

In the future, it could be interesting to award more points to tasks that were not often done with success in previous years.

LoyVanBeek commented 6 years ago

@esucar

grasping an object is much more complex than undersanting a question!

That very much depends on a lot of things. To be honest, for grasping I have some idea where to start from, even from scratch. For speech recognition, NLP and NLU, not so much.

Speech recognition has become a lot easier in recent years, grasping a bit but not as much. That alone should be enough to make a difference in scoring, to encourage better grasping.

esucar commented 6 years ago

Hello,

I think a more "OBJECTIVE" answer will be to get some statistics regarding the tasks completed by the different teams in the competition regarding understanding questions and grasping objects ... and compare them.

Regards, Enrique

kyordhel commented 6 years ago

Dear @esucar,

Beyond the subjectivity of any objective comparison we might provide, what you request is simply not doable. We have benchmarking data for ASR, but so far we haven't started to test understanding. Moreover, I don't think we can claim that a robot understands a question without entering in a huge debate.

On the other hand, object manipulation has been tested and benchmarked with success during the past eleven years. While might not be the case for an @Home setting, in labs and industry regular lightweight objects is closer to be a problem solved than transcribing human speech is.

But of course such claims are completely subjective and based on my point of view. Furthermore, although I've been doing research on NLU during the past four years, I still find ASR a huge bottleneck, and all NLP research far behind what an average person can achieve when it comes to cope with natural language.

Finally, I have a couple of publications on the topic I can share with you if you are interested (can't provide links now because editors haven't made them available yet).

esucar commented 6 years ago

Hello,

Of course is a difficult and philosophical question if the robor really "undersands" ...

What I propose is much simpler, just to have statistics of the scores of each team at the @home competition, and that will tell you how difficult or easy is each task for the teams competing in @home, that is the important aspect, not in other contexts.

Regards, Enrique

kyordhel commented 6 years ago

There are no statistics but RAW scoring data [1]. You may make your own stats from there. I find pointless to make statistics of every team because there is no continuity (few would be interested in AlemaniACs achievements), some change names (KIT Happy Robot -> Happy Mini), some teams prevail but change all people, etc.

If you want ready made statistics, you may check [2] and [3]; although I'm quite confident you already know them.

[1] https://github.com/RoboCupAtHome/AtHomeCommunityWiki/wiki/Scores [2] Iocchi, L., Holz, D., Ruiz-del-Solar, J., Sugiura, K. and Van Der Zant, T., 2015. RoboCup@ Home: Analysis and results of evolving competitions for domestic and service robots. Artificial Intelligence, 229, pp.258-281. [3]Holz, D., Iocchi, L. and Van Der Zant, T., 2013, March. Benchmarking Intelligent Service Robots through Scientific Competitions: the RoboCup@ Home approach. In AAAI Spring Symposium: Designing Intelligent Robots.

RoboCupAtHome / RuleBook

Relative weights of the different asepcts in the tests #416