RoboCupAtHome / RuleBook

Rulebook for RoboCup @Home 2024
https://robocupathome.github.io/RuleBook/
Other
149 stars 61 forks source link

Deadline, Milestones, and Features #361

Closed kyordhel closed 6 years ago

kyordhel commented 7 years ago

Question in a nutshell

How much the rulebook will change this year?

Issue in a nutshell

First, define the deadline for the first draft (e.g. October 15, deadline for SSPL CFP).

If the rulebook is going to be changed and not just buffed, define

  1. Which features are important to test. 2.1 To where will @Home direct research 2.2 Which tasks are relevant and when they need to be solved 2.3 What are the milestones (sob goals) to have these tasks solved
  2. Which features or abilities are required to achieve the before mentioned goals
  3. Which test will change
  4. To which extent these test will change

Introduction

Right after the RoboCup fever we got a lot of ideas, many of them very nice but that also involve MAJOR changes to the rulebook. The transition from 2016 to 2017 introduced several major changes from which the least relevant was the inclusion of DSPL and SSPL. Other major changes were the fusion of the Speech Recognition and Audio test with the Person Recognition test, the merge of Following & Guiding test with the Navigation test into the Help-me-carry test, the removal of the training and guiding phase in the Restaurant test, and the inclusion of the Set a table and clean it up test.

In the past, the rule was to introduce major changes to the rulebook every two years, but major changes have been introduced every year since 2015, so the Executive Committee and the Trustees advised to keep the rulebook as is and introduce only minor changes (buffs) to give teams to learn, prepare, and outmatch the test in 2018.

However, it seems to me that keeping the same Rulebook for 2018 not necessarily matches the spirit of all the changes suggested by the new TC members, some Execs, and even Trustees. Therefore, this decision must be taken once and for all, with a strong commitment of the TC of having all changes ready before the deadline.

Deadlines

At the time this issue was created the Call for Participation for the Social Standard Platform League has been issued with the deadline for the Qualification Materials' submission scheduled for October 15, 2017. The idea is that the other two leagues also follow this advance. The intention in a nutshell is that Qualified Teams are announced before the end of the year, making easier for teams to get funds and confirm their participation several months before the competition, while at the same time opening the opportunity for a second CFP if some qualified teams can't attend to the competition. In addition to this, and considering the large amount of applicants of 2017 (more than twice the capacity for OPL), a video demonstrating a Stage 2 test might be required for qualification to ensure only well-prepared teams. This shortens the time for releasing the rulebook.

As consequence, an early deadline for the first draft must be set, as early as middle of October of this year 2017. This requires strong commitment from the TC due to the necessary extra effort.

The following deadlines have been pre-set, and must be discussed and agreed:

It's a very tight schedule, yet doable.

Milestones and Features

During a meeting in Nagoya, the EC and TC agreed in that several milestones must be set, aiming to push robots and research toward actually solving very specific tasks such as: taking clothes out of the washing machine and folding them (laundry), remind an elder of their daily dose of insulin (nursery), make a sandwich (act as butler), give a tour in a museum (waiter), etc.

Such tasks involve certain specific abilities such as folding deformable objects, dealing with wet objects, manipulate tools, react to natural language queries, find people, keep track of the events and activities of a person, etc. None of them are solvable yet, and robots are not even close to achieve such feats, so milestones must be set and the test must be designed accordingly, having plans on what to change and when, and what to do if the tests are not being solved.

Suggested work plan

The suggested work plan is as follows.

Related issues:

362, #356, #353, #352, #351, #347, #308, #261, #179

LoyVanBeek commented 7 years ago

Before RoboCup, I've been thinking and writing about this already. My personal views ar written at https://github.com/LoyVanBeek/RoboCupAtHome-roadmap.

LoyVanBeek commented 7 years ago

As for success, I don't have the full score sheets but the finals cores. With those, I did some very basic statistics.

OPL stage 1

  Poster Speech & Person Storing Groceries Help Me Carry GPSR Stage 1
max attainable 50.00 200.00 250.00 200.00 250.00 950.00
MODE 36.67 0.00 0.00 0.00 0.00 #N/A
MAX 40.00 145.00 22.50 70.00 70.00 249.00
MIN 27.78 0.00 0.00 0.00 0.00 29.44
MEDIAN 32.78 35.00 5.00 25.00 10.00 104.78
AVERAGE 33.54 53.33 4.83 25.60 13.73 131.04
STDEV 3.52 53.54 6.51 24.69 20.29 81.18

OPL stage 2

  Open Challenge Set a table / tidy up Restaurant EE-GPSR Stage 2
max attainable 250 390 285 250 2125
MODE #N/A 10.00 #N/A 0.00 #N/A
MAX 194.54 20.00 115.00 65.00 621.04
MIN 62.96 0.00 -50.00 0.00 191.63
MEDIAN 146.76 10.00 27.50 32.50 392.01
AVERAGE 132.41 11.25 42.81 28.75 407.80
STDEV 54.56 6.41 58.85 24.75 176.20

SSPL Stage 1

  Speech & Person Cocktail Party Help Me Carry GPSR Stage 1
max attainable 200   200 250 700
MODE #N/A 0.00 0.00 0.00 #N/A
MAX 117.50 30.00 10.00 42.50 245.00
MIN 17.50 0.00 0.00 0.00 66.25
MEDIAN 50.00 7.50 0.00 0.00 91.67
AVERAGE 58.64 10.71 2.14 9.64 116.08
STDEV 33.57 12.97 3.93 15.91 65.70

SSPL stage 2

  Open Challenge Tour guide Restaurant EE-GPSR Stage 2
max attainable 250.00   285.00 250.00 1485.00
MODE #N/A 0.00 #N/A 0.00 #N/A
MAX 178.47 95.00 40.00 70.00 628.47
MIN 121.53 0.00 0.00 0.00 243.47
MEDIAN 133.68 0.00 12.50 10.00 271.29
AVERAGE 141.84 23.75 16.25 22.50 353.63
STDEV 25.21 47.50 17.02 33.04 184.05

DSPL Stage 1

  Poster Speech & Person Storing Groceries Help Me Carry GPSR Stage 1
max attainable 50.00 200.00 250.00 200.00 250.00 950.00
MODE 37.86 0.00 0.00 0.00 0.00 #N/A
MAX 41.43 101.00 30.00 30.00 25.50 209.36
MIN 31.79 0.00 0.00 0.00 0.00 36.07
MEDIAN 37.50 2.50 0.00 0.00 0.00 51.25
AVERAGE 37.00 24.70 6.75 4.75 5.35 78.55
STDEV 2.56 36.72 10.14 10.44 8.66 55.19

DSPL Stage 2

  Open Challenge Set a table / tidy up Restaurant EE-GPSR Stage 2
max attainable 250.00 390.00 285.00 250.00 2125.00
MODE #N/A 0.00 0.00 0.00 #N/A
MAX 183.68 10.00 90.00 55.00 524.08
MIN 0.00 0.00 0.00 0.00 107.29
MEDIAN 139.58 0.00 5.00 10.00 257.73
AVERAGE 114.51 4.00 20.00 20.00 275.33
STDEV 72.79 5.48 39.21 24.24 165.85
balkce commented 7 years ago

Thanks @LoyVanBeek I've read through all of your material, and I think there is a lot that I agree with.

First off I agree with your thought that the overall performance going down "is due to the rulebook not being stable". In this regard, I believe we need to make 2018 rulebook an evolution of 2017's.

However, this goes in contrast with another issue that @LoyVanBeek has pointed out: "We want to test too much". It's very tempting to fix this issue by reducing the number of tests (I really don't want to take away repetitions). And reducing the number of tests implies test rework, which in itself implies major changes.

I propose to:

  1. Remove some of the tests, and fine tune the rest. We can decide which to take away based on an agreed up roadmap of skills/behaviors we want to have in 2020. @LoyVanBeek document in https://github.com/LoyVanBeek/RoboCupAtHome-roadmap/blob/master/future.md provides a good starting point. We can "fine tune" them in such a way that the tests are an initial version of more complex tests in following years (very much like the Follow Me and Restaurant tests).
  2. For the tests that are removed, their tested skills could be tested to GPSR. Meaning, a "theme" could be set for each round of attempts. For example, in the first round, person recognition is going to be tested; in the second, navigation; in the third, speech understanding.
  3. Rework the remaining tests to be GPSR-based: no points are given for ASR, they can use QR code or a vizbox to obtain the command. I don't care how the command is given, I just want to see if the robot is able to do the test or not. Why GPSR based? Because then different versions of the command (e.g. "get a coke" instead of "get some chips") can be given, no team will receive the same version of the command. This is to start testing flexibility (another issue @LoyVanBeek pointed out). This is only the first step in start testing this. I'm aware that there will still be a lot of inflexible planning that can solve the test, even more so considering that a static variable-based grammar will need to be provided before hand for all the tests (so there isn't an issue about command understanding). The idea is that in 2020, I want to test only GPSR; I think designing the current tests in this manner is step forward in that direction.

In regard of which tests to take away and which ones to keep and fine tune:

That is all for now.

LoyVanBeek commented 7 years ago

As for themes: please stick with the current challenges as a theme for GPSR-like runs, to keep the rulebook stable.

Some other suggestions:

kyordhel commented 7 years ago

Long post @balkce! Thank you.

First I would call to attention to the fact that, again, discussions are mainly being held between the EC + @LoyVanBeek, @moriarty, and I. TC has nine members, and I would like to see them all reach consensus in #217, #308, #347, #352, #356, #358, #359, #361, and #362. I find Luca's and Sven's suggestion worthy.

Second, the rulebook's instability can't be considered as the predominant factor for the low performance, at least not in Stage I. Even less if we consider that the average performance is less than 10% of the maximum score. I would rather aim for an ill design of the tests because of the following (non quantitative) reasons:

Third. Some clarifications:

  1. Thematic GPSR is pointless (not GPSR anymore) and this is more evident HRI during task acquisition (critical ability in SSPL) is bypassed. Better to have mini retro-tests kinda: go-get-it, who-is-who, and follow me. In fact, what you are suggesting seems to me like rolling back to those tests of 2011.

  2. I don't think is wise to remove the Speech Recognition test, but upgrade it. From the beginning the plan was to move from predefined questions to NLP with Action Planning (the robot explains the plan, does not executes it). This far, I've seen very few robots with this ability, and those who have it barely use it due to the linearity of the tests (State-Machine like behavior).

  3. Storing Groceries and Help-me-carry. One thing is describe the test in terms of non-sorted GPSR actions, the other give those actions to the robot; something that is unnatural and goes against #358 where @iocchi suggests breaking the linearity of the tests and just mark goals, easing the scoring. Those tests are atomic tasks that require a lot of local planning, and so must be kept, I can't imagine any mother explaining to you how to help aunt with the groceries.

Finally, since no counterargument is complete without a suggestion for improvement, here are my 2 cents:

  1. Do major changes in how tests are described and scored, but keep (most) tests as they are.
  2. Describe tests in terms of goals, not actions or sub-tasks to accomplish.
    • Order is (mostly) irrelevant.
    • If the robot meets the goal, scores.
    • No PDF reports, the tasks must be solved with visible results.
    • Remove all abilities that don't lead to the milestones.
    • The goals use keywords related to GPSR commands (e.g., go, follow, deliver, find etc.).

Some examples

  1. Storing Groceries
    • Goal: Bring all objects from a near table to the shelf, grouping them by category.
    • 5 groups are scored.
    • Goal: Open the door of the cupboard.
    • Remark: some of the objects are cutlery or tableware.
  2. Speech & Person Recognition (based on #352)
    • Goal: Find 4 people in the Livingroom
    • Goal: Fetch a command from each person found.
    • Goal: For each command retrieved, explain how it could be accomplished (action planning). There is no need to execute it.
  3. Help me carry
    • Goal: Find the car outside the house (e.g. by following a person).
    • Goal: Bring the bag with groceries back to the house.
    • Goal: Find a volunteer to help carrying the groceries in.
    • Goal: Guide the volunteer found to the car.
    • Remark: Note that it is valid that the robot first find a volunteer and then follows the operator to the car while at the same time is guiding the volunteer, burning two transistors with the same spark.
  4. Set a Table:
    • Goal: Place a fork, spoon, knife, or any other cutlery object.
    • Goal: Place a dish, bowl, place mat, or any other tableware object.
    • Goal: Place a third cutlery or tableware object.
    • Goal: Place an object of the food category on a dish or on a bowl; or pour an object of the drinks category in a bowl, mug, cup, or glass.

Till here my thoughts

johaq commented 7 years ago

No PDF reports, the tasks must be solved with visible results.

In my opinion pdf reports have the advantage for the referee to easily see if the visible result is actually based on robot perception or just a guess / something hardcoded.

I also thought the reports were a good way to see how other teams performed in a task that is more informative than just the score.

kyordhel commented 7 years ago

@johaq

In my opinion pdf reports have the advantage for the referee to easily see if the visible result is actually based on robot perception or just a guess / something hardcoded.

Please counterargument the following:

  1. The audience couldn't care less about PDF report.
  2. PDF reports have the annoying side effect of robots doing only object recognition during a manipulation test.
  3. PDF reports have the annoying side effect of delaying score delivery (12 teams 3 tries 3 leagues = . 108 reports).
  4. PDF reports make scoring harder (all test should be scored DURING the test IN the arena).

Considering your point, I wouldn't mind to check the PDF report if the robot did something, to check false positives and, as you say, hard-coding (which violates the fair play rule). If the robot didn't achieve a single goal, it won't score. Period.

johaq commented 7 years ago

The audience couldn't care less about PDF report.

Definitely true for the audience on location. #357 and #355 raise requests to be able to better judge the robots performance from outside which I think published task reports can provide.

PDF reports have the annoying side effect of robots doing only object recognition during a manipulation test.

If you do not want robots to do certain things in a task don't award points for it.

PDF reports have the annoying side effect of delaying score delivery. PDF reports make scoring harder

Having never scored a competition I don't know very much about this. Maybe a latex template that teams HAVE to use could make this easier.

If the robot didn't achieve a single goal, it won't score. Period.

Agreed. I think this would come with the more goal driven scoring suggested here.

awesomebytes commented 7 years ago

October 30th CFP: Qualified Teams Announcement

Is this a fixed date? I haven't seen in the CFP any reference to it (for the SPL leagues the qualification material deadline is 15 Oct, OPL 5 Nov as per the CFP web).

I'm asking because teams that want to purchase a robot with a budget of year 2017 will find very helpful to have this information before 2018 (and as early as possible).

kyordhel commented 7 years ago

@awesomebytes

DSPL: October 7th (Call for proposals) SSPL: October 15th (Qualification materials) OPL: November 5th (Qualification materials) / November 26th (Qualification Announcement).

So far only OPL has all deadlines set. SPL have a more complex selection procedure, so I can't tell when they will announce something.

kyordhel commented 6 years ago

Closing due to:

  1. No major changes to be done
  2. All deadlines expired