RoboCupAtHome / RuleBook

Rulebook for RoboCup @Home 2024
https://robocupathome.github.io/RuleBook/
Other
147 stars 61 forks source link

Finals [restructure][draft] #508

Closed awesomebytes closed 5 years ago

awesomebytes commented 5 years ago

As discussed in the meeting from 14/01/19 we need to agree on what to do with the finals.

The two proposals that came up are:

This is related to About removing the finals and open challenge

balkce commented 5 years ago

Several ideas have been discussed for this.

I ask of you to discuss the following ideas. By Friday, February 1st, we need to come to a decision. I'll write it up in the rule book that day.

  1. Leave the finals as they are right now. Pros: since the Open Challenge have been removed, this is the only test that officially endorses an open demonstration. Cons: the finals have been an issue since I can remember because it is difficult for the audience (and even the jury) to ascertain which is the better robot.

  2. Rework the finals such that they have a "theme" in which the two teams need to show off something in terms of that "theme". The theme should be easily understandable. For example: "navigation", "object manipulation", etc. The idea is to have three "themes" that are decided upon (when, how, who decides this are still up for debate). Robot A enters the arena shows off a task related to theme 1, Robots B enters and shows off a task related to theme 1, Robot A shows off related to theme 2, etc.

    a. The teams choose the themes. Pros: they can show off their research. Cons: conflicts can arise if a non-gentleman team chooses a theme just to disrupt the other team. It is also difficult logistically, since the teams won't be able to prepare their finals beforehand. b. The TC chooses the themes. Pros: lets the TC dictate what they want the league to be each year, and lets the team prepare their finals beforehand. Cons: won't let the teams show off their own research. c. The TC chooses two themes, leaving the third to be an open demonstration. Pros: a bit of both worlds. Cons: we are still inheriting the issues of the old version of the finals, although at a lesser degree.

The are more variations to this, and these two ideas surely aren't the only ones that can be discussed.

When commenting please consider that the Finals are more or less the face of the league to the outside world (it is the most watched test of our competition), and that this year we are aiming for test clarity and successful robots.

balkce commented 5 years ago

Since nobody has commented so far, I'm letting you know that I'm tending to go with option 2c, with the themes for OPL and DSPL being navigation/object manipulation, and for SSPL being navigation/person interaction.

I'm willing to be convinced otherwise.

I'll begin doing these changes tomorrow (Feb. 1) at 12 PM Mexico City local time.

MatthijsBurgh commented 5 years ago

I think we should keep the finals as it is right-now. I think we need to keep one challenge to show your research.

Changing the format will not improve understanding what is happening to the audience. It will also not improve the scoring in my opinion. At least not, as long no real goals are set for each task. i.e. for manipulation, If team A shows very fast manipulation of simple objects and team B shows slow manipulation of difficult objects. This is still very hard to judge. The only way to make the scoring easy is to convert it to a normal test or EEGPSR. This is unwanted, as I would like to keep the Finals as something different and special.

Therefore I think we should keep the current setup.

kyordhel commented 5 years ago

@MatthijsBurgh It seems we have requests from above to change the format.

Unlike soccer, which has a clear champion, in @Home the open demos are often completely unrelated one to another and people don't really understand why the winner won. This is the reason behind the change: to make crystal clear to non-experts which robot is superior.

For that, both robots must face a similar challenge.

kyordhel commented 5 years ago

@balkce I'd go with a mix of both. We have two rounds: the open demonstration and the themed demonstration (not necessarily in that order).

If I remember well, many finals show some sort of popurrí of features. We can leave the demonstration open but request that both robots solve the same task or show the same skill as part of the demo (e.g. I want a cold-dead beer being handed over to me).

This way we keep both worlds happy.

MatthijsBurgh commented 5 years ago

Since when do the audience know which teams wins the final/the league? For the teams, it always takes a lot of effort to figure out, which teams wins the final. So I doubt that the audience knows.

For winning the league, As long as you are keeping points from before the final, the best team in the final, doesn't need to be the winner of the league. Does that already mean misunderstanding at the audience? This also happens in a lot of sports. Golf: best last round, doesn't mean you win the tournament, It is based on scoring in 4 rounds. Ice skating, also skating 4 distances, Winning the last, does't mean winning it all. So communicating the leaderboard before the finals may already solve the issue.

How does this work in @work, logistics, rescue? I think they have a similar scoring as we do. Or do they reset their points before the final? Maybe good to do a little research before switching.

kyordhel commented 5 years ago

Well... I understand your outrage.

I was giving you reasons. Not mine, just “reasons” because, you know, there’s people out there who wants our finals to cause people to shout: gooooooal.

All I can say is that my interest in the audience opinion is as overwhelming as it was in 2009. Haven’t changed in the slightest.

MatthijsBurgh commented 5 years ago

Then RoboCup should ditch all non-soccer leagues.

Also, when they do want to entertain people. Do put big grandstands to the soccer fields, like Eindhoven or Leipzig. They have failed to do so after Leipzig. So up there they don’t have the right focus.

But I am in no way able to come up with something which will cause the audience to shout, besides robot fighting each other.

balkce commented 5 years ago

To be clear: the Finals could stay the same as before. The EC, however, has raised concerns on its structure, so we are airing out ideas. I would consider leaving it as it is only as a last resource.

As for what to do, I like @kyordhel's idea: the team can show off whatever they want, but at the end of the demonstration, I want one task to be solved. In this way, is easier for the judges to compare the robots on a level playing field. And yes, also for the audience.

As for the task, I propose to find a person inside or outside the arena (for SSPL) or that a person is handed off an item (for OPL/DPSL). The teams can choose the level of difficulty. For example, they can decide who the person is, where they are located, the item to be handed off, etc.; or, they can let the judges decide; or, a mix of both. They also decide in what context the task is being solved: in a party where an invitee is looking for somebody, or an elderly person requiring being given their medicine. The idea is that the teams build their own scenario to solve the task, with the context they need to be able to show off other stuff they have researched.

I propose to score the demonstration in the following terms:

Thoughts?

benatuts commented 5 years ago

Another possibility that avoids the need for teams to program a solution to yet another task:

Take the intersection of the set of tasks that were previously attempted by the two teams who have reached the finals (or, alternatively, the intersection of shortlists selected by both teams).

Choose one task from the intersection randomly (using a coin, dice or pack of cards) with 24 hours notice before the finals. If not randomly, then a preference could be for Stage 1 tasks or the task that got the highest combined scores.

The finals is then a race against the clock, where both teams attempt to complete that selected task in the least time (either working in the same arena at the same time or alternating turns).

The timer creates excitement. The tasks have already been completed successfully by both teams before, so there is a level of assurance that the finals will make the teams and the robots look good.

nickswalker commented 5 years ago

Choose one task from the intersection randomly (using a coin, dice or pack of cards) with 24 hours notice before the finals. If not randomly, then a preference could be for Stage 1 tasks or the task that got the highest combined scores.

Depending on the scoring, this has the interesting property that a team might win the finals because they had a really solid Storing Groceries (for example). I like this because it would incentivize teams to be the best at the tasks they participate in, and not just cherry pick the easy points across all tasks.

A 24 hour race against the clock sounds really painful for the teams though. I thought I remember someone saying that all of the @Home people looking miserable was a concern...

justinhart commented 5 years ago

We could get Red Bull as a sponsor. Then again, it would be nice to actually enjoy a beer at the reception.

benatuts commented 5 years ago

A 24 hour race against the clock sounds really painful for the teams though. I thought I remember the someone saying that all of the @Home people looking miserable was a concern...

In my suggestion, it should not be a significant problem because the teams have already demonstrated they can solve the problem. It is just so that teams have time to ensure that their code is ready to go again with that task and perhaps make some small tweaks (based on what they've learned at the venue) to improve speed.

fabricejumel commented 5 years ago

At this time jury has to quote a lot of stuff. (1.1. Scientific contribution, 1.2. Contribution to @Home, 1.3. Relevance for @Home / Novelty of approaches, 1.4. Presentation and performance, 2.1. Originality and Presentation (story-telling is to be rewarded), Usability / Human-robot interaction, Multi-modality / System integration, Di_culty and success of the performance, 2.5. Relevance / Usefulness for daily life …)

For teams it’s impossible to give cue for all this topics during a short demonstration. Concerning the league internal jury, it s very important to score at least open source contributions to @home. But it could be done l with open challenge award (we will confirm it ? ) and poster session.

This job is done during the qualification process but I like the idea to give more visibility to articles and available codes. It s very essential that teams continue to contribute on open source solutions.

One important aspect for final is that this is time to have robocup trusties, industrials, manufacturers and large audience.

The idea of the league is to practice tasks autonomously in a flat including interactions with humans.

So it seems logical that final is in the same spirit. Personally I will choose for a form of GPRS challenge but with important issues;

-First, teams choose the scenario and can restrict or extend stuffs as they want. So it’s seem the same as theme from caleb. But I prefer they start from an existing test ( we could add perhaps sthg about HRI as previous guide tour test) in order to be able to quote. Discussion between EC and TC will quote the global difficulty of the tasks (As in gymnastic quotation process). In the future it could be more normalized with the creation of a skills book.

Second , teams can help more directly the robot with no limit but clear explanations (this help will reduce the difficulty score given by EC/TC . A form of extended Ex deus machina . In order that the "show must go on for audience " could be effective.

Third, teams have to explain what happen with visual returns form algorithms and if possible link with scientific approach. But it’s explanation of what happen, not slides about researchs

Lot of demonstation seen in the past finals can still be done. Just more robot actions

kyordhel commented 5 years ago

@benatuts

  1. No Stage I tests. The finals are to show the very best the league hast to offer. Stage I is the filter for the real competition which occurs in Stage II. In theory all competitors should be able to solve a Stage I upon qualification.

  2. No death-runs. We want clean and nice execution. Precision is key in @home, not speed (at least not yet). Teams do terrible when they are competing against time.

  3. No improvisation. We are sick of tests being prepared the night before. The idea to have something set for the finals is that potential finalists prepare in advance something worthy. As @nickswalker just pointed out: all of the @Home people looking miserable is a concern.

What creates excitement? Well, when a robot helped a human to move a table, it created excitement. When the robot tried to flip a pancake, that created excitement as well. Uncapping a beer, serving real drinks, having a swarm of tiny robots lurking around the arena, a robot opening the door in 0.3secs. All that created huge excitement.

Why? Well, the presenter didn't steal the attention focus, the robot performed a clean execution, and everybody could relate as: I want that in my house. Then they started to explain science.

I'm more into pushing teams to do something challenging. Quoting @LoyVanBeek I'm buying the first robot that can clean my toilet after giving me my favorite beer

kyordhel commented 5 years ago

@balkce I'd go with HRI for SSPL and manipulation for the rest. We have a battery of ideas given by P&G and from my research on what housemaids hate most. I think focusing in an objective would help to sort out the winner. Scoring for innovation and difficulty of the task should apply only for the open demo and be ranked by TC/EC (guests have no idea).

@justinhart We offer beer when the venue allows drinking and the LOC provides budget.

benatuts commented 5 years ago

If the finals requires a new and different task, then teams are likely to prioritize most their efforts into earlier tests, postponing their efforts for the finals as late as possible when they feel confident they will be able to reach the finals (or if they weren't feeling confident, perhaps they'll only start on the finals when they've made it to the top four).

I don't yet understand how stages work in the new rules -- as it hasn't been written up yet -- but my understanding is that so far teams will already need to prepare an absolute minimum of 4 separate tasks (2 for Stage I and 2 for Stage II), and may need to prepare more to be competitive. It sounds like they'll also need to do an open challenge if they're interested in that prize and perhaps a new finals task. I seem to recall seeing something about ensuring that the new Stage II challenges are designed so most teams will be getting no more than 1/4 of the points.

I remember being exhausted in Montreal and - I'm sure - looking miserable. I fear that this new rulebook will be just as problematic because it still suffers from requiring many challenging tasks to be completed. This is compounded by the fact that July is only a few months away.

I've had some prior experience with RoboCup Soccer and the mindset there was very different. Teams only have to do one task (play soccer), but the challenge was to do it well. The competition is not "can any team play soccer at all?" but rather the expectation is that all teams can play a reasonable game of soccer and the question is "who plays soccer best?". What makes RoboCup@Home so brutal in comparison for newer teams is that there is so much effort to just get things to work, that there is little time to slow down and explore challenging research that would move the robot from "I can do X" to "I can do X well". Or, to put it another way, RoboCup@Home is currently a race to see which teams can implement the most novel features first, rather than a challenge of who implemented those features better.

For me, I would be much more thrilled by a robot that does something simple magnificently, than I am by a slow and mediocre demonstration of something "hard". Indeed, this is like your example of being wowed by a robot opening a door in 0.3 seconds. Even if every team can open a door or do the Stage I "serving drinks" challenge, reaching the finals could incentivise teams to focus on doing those things magnificently.

If time to completion isn't the right measure, fine, but then perhaps the judges might evaluate the efficiency or "elegance" of the demonstration. How close is the robot to acting in a way that is natural? (rather than the current situation where it would be almost certainly faster and less hassle to just get your drink by yourself instead of getting a robot that will complete the task slowly and awkwardly).

If bringing a beer and cleaning the toilet will wow the audience, then those problems are (almost) already in Stage I. The finals could be to complete the serving drinks task and then the take out garbage tasks one-after-the-other, but where it must be done magnificently. I would be wowed by a robot that solves such tasks so flawlessly and effortlessly that you almost forget it is a robot.

balkce commented 5 years ago

I don't want teams to have the burden of uncertainty of what are the finals, even less so having them code their finals while they are still in Stage 2 (which is usually carried out the day prior to the finals).

I'm also getting a bit tired of improperly using randomness as the way we as judges solve fairness issues. GPSR works because, even though one team may get an easy task while another gets a harder one and both are worth the same points, it's fine because it repeats. Whatever unfairness occurs, it is ironed out throughout the attempts. However, randomness with one sample is not "being fair", it's just not being decisive.

Let's be decisive then: let us (as visionaries of robotics that we supposedly are) establish an objective that the robots need to achieve every year (as a type of "theme of the year") and let that be the finals, with the added freedom of the teams deciding HOW to achieve it. This leaves enough room for them to show their own research as part of the same demonstration, incorporating the objective as part of the story they want to demonstrate. For this to work, the objective needs to be not-to-specific, but not-to-vague either. Some proposals have already been suggested in this thread. I think the following more or less satisfies them:

If you don't like these themes or objectives, right now is your chance to be visionaries: what do you want the best robot in your league to be able to do well this year?

As for how to score, all of you have brought good ideas. I propose to change the current scoring sheets to the following:

Internal jury (EC, TC?):

External jury (guests):

Again, if you don't like these or want to add more, this is your chance.

PS. I'm going to start writing the PR for this in around three hours.

MatthijsBurgh commented 5 years ago

This sounds like the best alternative, if you want to replace the current setup.

kyordhel commented 5 years ago

[...] teams are likely to prioritize most their efforts into earlier tests...

Not our problems teams came unprepared. Victory to those aiming to the finals.

I don't yet understand how stages work in the new rules

§3.6 Explains it. You can choose at least 6~8 tasks, not necessarily different, to perform in each stage. Your score is the sum of the best execution of each task.

I fear that this new rulebook will be just as problematic

Send a PR patching problematic rules.

What makes RoboCup@Home so brutal...

Welcome to the big boys club. We don't play games here. We change the world.

I would be wowed by a robot that solves such tasks so flawlessly and effortlessly that you almost forget it is a robot.

Solving a task flawlessly in an Open Demo doesn't mean a team can solve it flawlessly in a standard test. Since 2009 I've seen only one team succeed in flawlessly solve a test: 2014, Wright Eagle, Restaurant. Far from not looking like a robot.

kyordhel commented 5 years ago

@balkce I mostly agree in what you propose, HOWEVER

I think the following more or less satisfies them:

  • OPL/DSPL. Theme: manipulation. Objective: a person has an item in their hand they didn't have at the start of the demonstration.
  • SSPL. Theme: HRI. Objective: a person is found and followed.

I think you're lowering the bar way too much. I don't want to see neither a Go! Get It! nor a Follow Me in the finals. Those tasks are pretty much a must for qualification. Hand-over stuff is as old as I can remember, while back in 2014 robots were able to go through a crew and find their operator, that after going through an elevator. What happen with that?

I can remember TU/e running pointing detection and furniture recognition, NimbRo and homer cooking (both pouring stirred eggs), and Cosero watering plants. Wright Eagle and homer ran knowledge transfer between their robots, and homer had a nice demo of ASR between robots. Likewise, eR@sers, homer, and b-it-bots all ran action detection and learning demos in the finals. I think we should not lower the bar from there.

My suggestion (also task driven):

Evil? Yes. These are the finals. It's about time to surpass feats from the past.

So far my thoughts.

balkce commented 5 years ago

Let's meet in the middle:

balkce commented 5 years ago

Do remember that the objective is a PART of the demonstration, to leave room for the teams to show off their own research. The objective cannot be too complex.

balkce commented 5 years ago

I'm closing this issue. Further discussion can be carried out as part of the pull request.

benatuts commented 5 years ago

[...] teams are likely to prioritize most their efforts into earlier tests...

Not our problems teams came unprepared. Victory to those aiming to the finals.

You also said, "We are sick of tests being prepared the night before". Saying "not our problems" is ignoring the issue and will perpetuate the problem of finals being neglected. If you want teams to invest seriously in preparations for finals, then that preparation should be something that is not "wasted" effort for teams that don't reach the finals.

I don't yet understand how stages work in the new rules

§3.6 Explains it. You can choose at least 6~8 tasks, not necessarily different, to perform in each stage. Your score is the sum of the best execution of each task.

In that case, I feel that the rulebook has just as much complexity and breadth as before. There may be fewer pages, but the scope is just as broad.

In Montreal, I had barely any sleep all week. If it is a concern that @Home people look miserable, then I am providing my feedback that I anticipate the new rule book would have me looking just as miserable (or perhaps more miserable).

Send a PR patching problematic rules.

We seem very far from having addressed the fundamental issue and a PR does not seem the right response. Nonetheless, I will send a PR.

Welcome to the big boys club.

This is disrespectful and sexist. I have shared my perspective as an earnest competitor and a professional.

https://github.com/RoboCupAtHome/RuleBook/issues/533#issuecomment-458967037

I was NOT commenting in this thread because I want an easy competition. I am responding to the apparent mindset here of trying to set teams up for failure, and then being disappointed when they do fail.