johaq commented 6 years ago

Here is a summary of what was discussed. Additional input on open questions by non committee members is highly appreciated. (Some of this is also mentioned in issues made by Komei earlier)

Have every test be a focused GPSR with a theme. Each theme will have tasks associated with it. Each task consists of functionalities. An example would be: Theme: Set up a party Task: Bring drinks to the living room Functionalities: Navigate to a room, find and recognize an object, pick an object, place an object

Open questions: What are the themes? Current suggestions are "Set up a Party", "Clean a Room", "Host a Party"

Who defines the tasks? Are all tasks defined by the TC or do we let teams provide their ideas?

How are the tests scored? Do we award partial points if a task is only partially solved? Do we also score teams showing certain functionalities not just complete tasks? Some ideas discussed are: A team showing the same functionality multiple times gets less points over time (diminishing returns). A functionality gives less points the more teams can perform it.

How are the tests executed? How can arena setup be made not a nightmare? How much time should each team get? We will have one theme per competition day with multiple tries.

How does the robot receive the tasks? Are they provided by speech as previously in GPSR or does the robot have to look around to find things to do according to the theme?

Motivation for this: Less stress on teams due to fewer tests. Moving away from state machine like defined tests and scoring. Have themes and tests that are easier understood by the audience.

balkce commented 6 years ago

Let the discussion for the 2019 rule book, begin!

LoyVanBeek commented 6 years ago

Additionally:

The good: New teams did well!
The bad: new teams did rely on WiFi and assumed that would work well. This assumption proved to be wrong.
CFP will go out in October/November. An executive summary (2-3 pages) should be ready by then.
Teams should be encouraged to contribute to the rulebook
Ideally let free how a task is solved, give points for results only?
3 tasks, 1 per day as per EC decision, but each test can be repeated.

benatuts commented 6 years ago

Some theme ideas:

Elder care (e.g., deliver and monitor medication intake, identify hazards and fallen people, assist with emergency services, answer door)
Parenting (e.g., deliver nappies, monitor when child wakes, ensure children stay away from the kitchen)
Busy office worker (e.g., deliver/iron clothes, accept deliveries while owner is away, monitor home for break-ins, leave messages, find lost TV remote)
Family (e.g., deliver snacks, check teens are doing homework, call everyone for dinner, set table, confiscate phones, break up fights)

kyordhel commented 6 years ago

Themes are fixed for what I understood. It is not clear how to integrate GPSR there since command retrieval turned kind of optional.

I think we should have two runs. One early in the morning, one latter in the afternoon so teams have time to eat, analyze performance, and fix. For scoring I would go with a boolean ToDo with penalties per assistance request, but that requires further discussion.

Themes are:

Cleaning-oriented household tasks (chores)
Social-oriented butler/waiter tasks
- (Maybe GPSR enters here)

Themes are per day, so we have RIPS, Stage 1 (2 days) and Stage 2 (2 days) with maybe the inclusion of restaurant/shopping mall. Technically there is no script other than

Enter the arena
Perform task
Leave the arena

It is up to the robot to choose what to do. The arena is in a state that requires the robot to take actions, namely clean or serve.

Cleaning

The robot needs to perform one of many cleaning tasks within the given time, e.g.

Fill dishwasher
Clean toilet
Wipe windows
Mop
Clean floors (move objects)
Store groceries

Serving

There is a party in the apartment and all people there have particular needs. The robot needs to find a task and carry it, e.g.

Serve a drink (pouring, deliver, etc)
Serve breakfast (pouring, make a sandwich, etc)
Guide a guest in/out
Make a group photo
Entertain children

Here, many robots can work at the same time.

Constrains

Is up to the robot to find out how to carry on the required/available task. It is acceptable for a robot to require human assistance as often as required, but each time a human intervenes the score gets reduced (e.g. halved), but in order to score, a robot has to complete a task (e.g. fill a dish washer, store groceries, serve and deliver the requested drink, etc).

rventura commented 6 years ago

The idea of making all tests GPSR like looks a very good idea: GPSR is general enough to encompass virtually any task, while from a very practical point of view, it reduces dispersion and helps focusing effort (e.g., almost all leagues are single-task, and that helps consistent performance increase).

By the way, are you considering adopting the new scoring system proposed by Luca a couple of years ago? (issue #358 I believe)

Best, Rodrigo Ventura

rventura commented 6 years ago

(?)

LoyVanBeek commented 6 years ago

Doing everything as GPSR also gives great flexibility to define new tasks. In the end, I want a General Purpose Service Robot, not a limited purpose robot or several different one; I want one that can do all my chores.

johaq commented 6 years ago

What I forgot to mention in my first post was the idea of letting teams provide a sort of ability matrix (something implemented in @work I think). The commands they get would then be based on what the team think their robot is capable of. Background is that at previous competitions teams remarked that they are not able to show of the things they actively research because the robot does not get to the point in a task where it would be required. Also this would allow new teams to focus on a few core abilities and then get tested on these.

Personally, I think that at the heart @home is about integration so certain functionalities should always be required but maybe we can let teams state their advanced abilities. If we want to implement this we also need to discuss how to do that. Let teams provide functionalities and base tasks on that or let teams choose from the list of tasks directly?

balkce commented 6 years ago

The suggestion of @johaq is something that has been talked about previously, and it is of interest since it brings a lot of positives to the competition: teams show off what they can do (and no excuses for poor performance), and the audience sees robots doing stuff, all of which results in less unnecessary stress for everybody and a more fun competition.

However, the issue with this always comes back to logistics: gathering the necessary information from the teams during the competition is a hassle.

What I was thinking that can be done to solve this is to try to automatize the process and have a centralized server with the sole purpose of generating commands. It should have a secure web portal that the teams can access before the test, where they select the functionalities their robot can achieve. During the test the judges access the server, selects the team to be evaluated, and it should generate the command considering only the functionalities the team previously inputted.

We could have a version of this in GitHub (like the GPSR command generator that we have now) so that teams can download it and try it out before the competition.

The only issue, as always, is who will build this?

kyordhel commented 6 years ago

@rventura, @LoyVanBeek problem with the GPSR approach is that it somehow conflict with the last EC decision regarding allowing robots to solve a task proposed by them within a given application domain (e.g. cleaning, without specifying whether wipe windows or fill the dishwasher). On top of that the GPSR approach has the major setback of fairness, and repeatability,

Having said that, I also support what @balkce said.

LoyVanBeek commented 6 years ago

@kyordhel which decision is that exactly? I can't find anything about that being decided.

I'm fine with giving robots a command via text directly instead of speech. If GPSR commands are limited to what a robot can actually do, I don't see a big fairness issue, especially when commands are focused towards a topic. Repeating the same command for all robots is the most fair, but also allows the robots be dumb again and run state-machines. If we run e.g. help-me-carry as GPSR, there can be several (e.g 5) ways to say 'go to the X' while X can be eg. 4 rooms. Those 5x4 possible commands maps to the exact same code for the robot. I'd say that's fair and repeatable while different enough to be useful for RoboCup.

balkce commented 6 years ago

In terms of the EC decision that @kyordhel is talking about, it was described to the TC in the last meeting before the ceremony in Montreal. It was not written down anywhere, unfortunately. The idea is that we want to push teams to make the robot decide what to do and how to do it given a certain scenario.

However, we may have a compromise here, since I don't think these two ideas are mutually exclusive. Let's give the teams the option to either be given a GPSR-type of command (so that the robot knows what to do) or to let the robot decide what to do. This could be even integrated as part of the command-generating server.

kyordhel commented 6 years ago

@LoyVanBeek I thought that what I said was explicitly stated in one of Komei's issues (haven't checked them carefully yet). Like @balkce I don't think both options are mutually exclusive, but still I find them extremely hard to conceal.

What I understood about the meeting is:

We have 2 tests only (in stage 1).
The robot enters the arena and does something. We don't care what, but the robot actually does something. It is all about accomplishing one task, and get points based on the difficulty of the solved task and the number of times it got human help.
We have [at least] 2 runs. One in the morning (e.g. 10 ~ 12) and one in the afternoon (e.g. 16~18). 3.1 Robots might be allowed to re-queue for another try after a successful execution

The point of all this is to reduce exhaustion from referees, to give time to teams to prepare and buff between runs, to give people time to get a proper night of sleep, and to give nice shows to the audience in which robots do not fail. In that sense is that I proposed (somewhere) to have two scenarios: cleaning and party. Cleaning has (virtually) no HRI and party can bee all about HRI. None of this scenarios consider GPSR-like commands, since is up to the robots to choose what to do. In any case, in the party mode, the robot could search for a task ant that will require HRI and the GPSR commands (e.g. bring me a martini and some cheetos).

benatuts commented 6 years ago

To respond to the posed questions: "How are the tests scored? Do we award partial points if a task is only partially solved? Do we also score teams showing certain functionalities not just complete tasks?"...

... I would suggest that there could be a different approach to scoring. Scores should be based on quality of performance rather than functionality or difficulty.

Rather than scoring on functionality, I think test difficulty should be calibrated so that most competing teams are able to successfully "solve" the complete test. Scores could instead be differentiated based on the quality of the performance: how fast was the test completed, how long was spent searching, how many times was the human misunderstood. Perhaps in the second stage, there could even be guest judging panels who score the HRI.

This would create a more exciting experience akin to the SPL Soccer League. In soccer, every team can play the soccer. It isn't a case that some teams can kick and others can't. Having the common ground of all teams being able to play means that soccer serves as a benchmark of performance. Each year, difficulty ratchets up very gradually as the league gets better (e.g., color coded beacons were removed only when team's localization was good enough that a game without beacons would still be exciting).

In @Home, we could still retain the ultimate long term vision of a GPSR but right now be more targeted with a slow ratcheting up towards the long term vision. For example, imagine having just one task for 2019: "deliver snacks to people waiting in a room." That task still presents the scientific challenge of integration that RoboCup is about. Robots will need to perform navigation, grasping, person detection, communication and HRI. Yet, the simplicity of just one task (or a very small number of tasks) makes it easier to referee to judge and for teams to understand. It gives teams a chance to iterate and improve their code during the competition, and implement strategies that they see other teams use successfully. It would potentially create a more exciting game where the audience gets to see virtually all teams succeed in some way, but with the excitement of seeing how teams balance the risk/reward of speed, naturalness and accuracy when performing challenging actions like searching and grasping.

iocchi commented 6 years ago

Dear all, my suggestion was built around the concept of a skill book. The skill book contains a description of all the skills that will be scored. It will be created with the contribution of teams. Nothing else, except what is in the skill book, will be scored, so teams have to add what they want to be scored there. Skill book has to be approved anyway by TC and of course all the skills will be available to all the teams. I think we can have two kinds of tasks: 1) free tasks: any combination of the skills chosen by the teams 2) GPSR tasks: GPSR generates high-level goals and not instructions about what to do (example: "clean the kitchen table") and robot has to solve the task by combining the skills.

In both cases, only skills described in the skill book will contribute to the score. For GPSR tasks an additional score element will be percentage of achievement of the goal (e.g., how many items have been actually properly removed from the kitchen table). As already mentioned, each competition day will have a theme (e.g., Cleaning the house) and we can have free tasks in the morning and GPSR tasks in the afternoon about that theme, for example.

If we agree on this basic concept, I think the most important part is to define a skill template and start a call for skill definition among the teams. In my thoughts, a skill should include the following elements

category (e.g., navigation, speech recognition, ....)
name or code (e.g, NAV01)
description [easy to be verified by referees] (e.g. the robot starts from outside the apartment and reaches one room in the apartment traversing at east two doors (no moving obstacle during the path)
difficulty level: (e.g., level 3)

TC has to make sure that the skills within the same category are properly (partially) ordered.

The score will be assigned in such a way to reward integration of many functionalities and to avoid cumulating score by repeating skills of the same category. For example, it can be the sum of the max for each category, so that repeating many times a skill in the same category will not grant more score. In Stage 2, we can also enforce that every task should show at least 2 (or 3) skills from different categories.

I think this proposal is mostly on-line with all the discussion we had so far in many occasions. So I suggest to shape it more concretely in the next weeks and finalize it in September to communicate it to the @Home teams.

Thanks.

kyordhel commented 6 years ago

@benatuts NO PARTIAL SCORING FOR INCOMPLETE TASKS

Not only the robot chooses the task, but also how to accomplish it. All planning and skill choice relies directly on the team's strategy, strengths, and the robot's A.I. Whenever the robot has a problem, it may call a Deus Ex Machina (aka human assistance) that reduces the amount of points scored as indicated in the Rulebook or SkillBook (as proposed by @iocchi).

Let's say you chose to store groceries and find one object very hard to identify. You may ask human help with that single object and then getting an (e.g.) 25% object reduction. Then you find another object that is way too heavy and, again, you ask for human assistance for another (e.g.) 25%. At the very end of the test you would score 1000 - 50% = 500pts for storing groceries. However, if you fail to accomplish the task (e.g. timeout, malfunction, etc), no points. It is not rulebook's fail, but yours for choosing that task, for you could have chosen something "easier" like taking out the garbage, or cleaning the floor by removing objects.

LoyVanBeek commented 6 years ago

Meeting of Sept. 6 2018,

my notes:

Tests, 1 per day:
- Cocktail party
- Storing groceries
- Restaurant
- GPSR??? Distribute it's parts over the other tests, so we can tell the robots what do and parametrize the test.
Repeat those all day
Achieve the goal, only point for completed task.
Partial scoring: When robot fails at eg. 99th fork of 100 to pick, how many points? Not full points but e.g. 50%. You can ask a human for help and still achieve the goal, but with a small penalty. You can do multiple tries! You can re-queue for a next attempt when you fix that one bug that messed up the 99th fork.

RoboCupAtHome / RuleBook

Summary of EC/OC/TC Meeting (10.07.18) #480

Cleaning

Serving

Constrains

Meeting of Sept. 6 2018,