RoboCupAtHome / RuleBook

Rulebook for RoboCup @Home 2024
https://robocupathome.github.io/RuleBook/
Other
140 stars 60 forks source link

New scoring system #358

Closed iocchi closed 4 years ago

iocchi commented 6 years ago

Dear all, another great RoboCup@Home this year with a lot of great novelties (new leagues and new teams) and perfect organization (thanks Execs/TC/OC and LOC). Still we can improve and I think it is now time to discuss about the rules and the scoring system to solve some problems that we have noticed in the last years, but have not been solved yet.

The main problems in my opinion are:

  1. Writing the rulebook is very complex and it is left on the hands of only a few people.
  2. Implementing the tests by TC is difficult (not easy to delegate to volunteer referees, so requires TC to be present to evaluate every test) and sometimes the spirit of the test is not fully captured by the referees.
  3. Teams are often not happy because of some rigid constraints in the rules, and sometimes they decide to not participate at all to some tests (probably because of their complexity). Some team members are not happy because the robot never reaches a point in which their development is tested, due to previous failures.
  4. The Open tests (challenge and final) tends to be very conservative (no need to risk some autonomous behaviour or a "real" speech recognition phase) and sometimes seem to not seriously address the functionalities, tasks and scientific objectives of RoboCup@Home. In most cases, we still see more human presentations than robot tasks.
  5. PhD students and researchers rarely find the subjects of their research in @Home tests. Also (maybe as a consequence) there are not so many publications from @Home teams.
  6. Audience rarely see something interesting and they have difficulty in understanding what's going on.

I think these problems can be attached with a new structure of the tests and a new scoring system. The current scoring system was introduced in 2008 to replace the previous one that assigned only boolean scores to tests. By introducing scores to intermediate steps we could better evaluate performance of teams. However, this introduced complexity in writing the rules and in evaluating the tests (scoresheets longer than a page) that increased a lot over the years.

On the flight back from Nagoya, I was reading how some sport disciplines that combine different elements are evaluated. For example, evaluation of gymnastics is based on a Code of Points, a Table of Elements describing what is expected in the tests and on a mechanism to compute the score of a performance. See https://en.wikipedia.org/wiki/Code_of_Points_(artistic_gymnastics)

I think RoboCup@Home has a similar concept: we want to integrate basic functionalities in different ways and reward the solution that better integrates many complex abilities.

Since the flight was long enough, I tried to think how to adapt a scoring system like this to @Home and I tried to put down an idea that is summarized below.

RoboCup@Home score system based on sports (e.g., gymnastics)

Key concepts: a set of elements (or skills) will be defined a priori by the TC. Each element is the description of a specific ability that the robot has to demonstrate for a given category/functionality. Each element is associated to a value of difficulty. Each element will have a boolean evaluation during a test (either successfully achieved or not).

Tests: as a difference with the current rules, tests are prepared by the teams and must include some of the elements specified in the rules. These elements are the only way to get score. The elements to include, their order in the test and the way in which they are combined is decided by the teams. The test can include any other activity/task (choreography) that will not provide score, but may be useful to correlate the different elements. Teams should inform the referees about which elements the robot will perform before a test, to make refereeing easier.

Score: the score of a test will be the sum of two factors: Difficulty Score + Execution Score.

The D-Score is the sum of three values of difficulties (chosen by the team) among all the elements that are successfully achieved during the test. Only one element for each category/functionality can be chosen for the score. So to obtain a full score, a test must successfully combine at least 3 different functionalities. Teams may decide which elements to include in the evaluation of a test to minimize the penalty for repeating an element multiple times (see E-Score). The choice is made at the end of each test and cannot be changed later on. If less than 3 elements from different categories are achieved, the D-score will consider only those values and a penalty is applied in the E-Score.

The E-Score is an objective evaluation of the execution of the test, ranging from 0 to 10. E-Score is computed as 10 - Penalties (with a minimum value of 0). Penalties include in particular missing elements (i.e., less than 3 elements from different categories achieved in the test) and guarantees the following maximum scores: Test with >= 3 elements -> max score = D1+D2+D3 + 10 Test with 2 elements -> max score = D1+D2 + 6 Test with 1 element -> max score = D1 + 3 Test with 0 elements -> score = 0

More specifically, penalties may be defined as follows:

P =
// missing elements
4       only 2 elements
7       only 1 element
10      no elements
// repetitions of elements over tests
D_i (1 - \gamma^N_i)   (N_i : times an element E_i considered in this score was already counted in previous scores), \gamma = 0.9
// special rules
1       no moderator explaining what is happening
(others) ...

Organization of the competition

Each team will have slots of 10 minutes to perform tests within this schema. To obtain the best score, each test has to integrate 3 different functionalities and different tests have to test different abilities. So the best team will perform different tests combining in different ways complex abilities.

Of course this can be done only for a few tests (e.g., only on Stage I, only on Day 1, etc.) I think this concept should anyway replace the Open Challenge and be adopted also in the final.

Advantages

kyordhel commented 6 years ago

@airglow, @awesomebytes, @HideakiNagano21, @justinhart, @mnegretev, and rest of TC: your thoughts, comments and proposals, please. We also need a volunteer to script this.

balkce commented 6 years ago

Thank you Lucca for the idea.

The only issue that I find with it is that, even though there is a lot of objectivity in your explanation, I'm pretty sure some teams and some judges will find a way to "subjectivize" the test. For example, lets say one of the elements to be demonstrated is navigation: one judge may provide full points for the robot just because it moved, while another may not provide any points because it "did not impress him" navigation-wise.

To solve this, I think details should be provided of what it means to "successfully achieve" an element during the test. This will take some writing, since it would need to be done for each element, but it is possible.

Unfortunately, this would drastically change the current state of the rule book, and I believe we should let it "rest" this year to let the new teams catch up.

Having said it all that, this idea is definitely doable for 2019.

I would love to hear from the rest of the TC: @airglow, @awesomebytes, @HideakiNagano21, @justinhart, @mnegretev, others?

kyordhel commented 6 years ago

@balkce What about starting those changes now in a 2019 branch. This way we may have next rulebook right after the competition (or even earlier), giving extra time to teams to develop and test, and much more important: this way we settle what we are aiming for in the mid-term.

@iocchi?

balkce commented 6 years ago

@kyordhel agreed. I'm not too sure how much I'd be able to write everything, but I could work on the outlines.

balkce commented 6 years ago

I created a new branch called "sportbasedscoring" to reflect a possible rule book with @iocchi's scoring ideas.

This is the core of the changes/proposals:

I left the Stage II tests and their scoring as they are right now, since I think the TC has a lot of ideas of where to push the top teams. Finals are unchanged as well.

I adapted the current rule book so as to take advantage of its useful parts (Introduction, General Rules, etc.), and changed only the relevant parts to consider the new changes.

Obviously, everything is up for debate, so discuss away.

kyordhel commented 6 years ago

No major changes in the rulebook for 2018. Rescheduling for 2019.

balkce commented 6 years ago

Yes. This idea is at least for 2019.

The link to the branch: https://github.com/RoboCupAtHome/RuleBook/tree/sportbasedscoring?files=1

johaq commented 4 years ago

We had a major scoring system change in going from incremental scoring to main goal scoring. I don't think we will have another major change in the foreseeable future.