Closed kyordhel closed 6 years ago
Before RoboCup, I've been thinking and writing about this already. My personal views ar written at https://github.com/LoyVanBeek/RoboCupAtHome-roadmap.
As for success, I don't have the full score sheets but the finals cores. With those, I did some very basic statistics.
Poster | Speech & Person | Storing Groceries | Help Me Carry | GPSR | Stage 1 | |
---|---|---|---|---|---|---|
max attainable | 50.00 | 200.00 | 250.00 | 200.00 | 250.00 | 950.00 |
MODE | 36.67 | 0.00 | 0.00 | 0.00 | 0.00 | #N/A |
MAX | 40.00 | 145.00 | 22.50 | 70.00 | 70.00 | 249.00 |
MIN | 27.78 | 0.00 | 0.00 | 0.00 | 0.00 | 29.44 |
MEDIAN | 32.78 | 35.00 | 5.00 | 25.00 | 10.00 | 104.78 |
AVERAGE | 33.54 | 53.33 | 4.83 | 25.60 | 13.73 | 131.04 |
STDEV | 3.52 | 53.54 | 6.51 | 24.69 | 20.29 | 81.18 |
Open Challenge | Set a table / tidy up | Restaurant | EE-GPSR | Stage 2 | |
---|---|---|---|---|---|
max attainable | 250 | 390 | 285 | 250 | 2125 |
MODE | #N/A | 10.00 | #N/A | 0.00 | #N/A |
MAX | 194.54 | 20.00 | 115.00 | 65.00 | 621.04 |
MIN | 62.96 | 0.00 | -50.00 | 0.00 | 191.63 |
MEDIAN | 146.76 | 10.00 | 27.50 | 32.50 | 392.01 |
AVERAGE | 132.41 | 11.25 | 42.81 | 28.75 | 407.80 |
STDEV | 54.56 | 6.41 | 58.85 | 24.75 | 176.20 |
Speech & Person | Cocktail Party | Help Me Carry | GPSR | Stage 1 | |
---|---|---|---|---|---|
max attainable | 200 | 200 | 250 | 700 | |
MODE | #N/A | 0.00 | 0.00 | 0.00 | #N/A |
MAX | 117.50 | 30.00 | 10.00 | 42.50 | 245.00 |
MIN | 17.50 | 0.00 | 0.00 | 0.00 | 66.25 |
MEDIAN | 50.00 | 7.50 | 0.00 | 0.00 | 91.67 |
AVERAGE | 58.64 | 10.71 | 2.14 | 9.64 | 116.08 |
STDEV | 33.57 | 12.97 | 3.93 | 15.91 | 65.70 |
Open Challenge | Tour guide | Restaurant | EE-GPSR | Stage 2 | |
---|---|---|---|---|---|
max attainable | 250.00 | 285.00 | 250.00 | 1485.00 | |
MODE | #N/A | 0.00 | #N/A | 0.00 | #N/A |
MAX | 178.47 | 95.00 | 40.00 | 70.00 | 628.47 |
MIN | 121.53 | 0.00 | 0.00 | 0.00 | 243.47 |
MEDIAN | 133.68 | 0.00 | 12.50 | 10.00 | 271.29 |
AVERAGE | 141.84 | 23.75 | 16.25 | 22.50 | 353.63 |
STDEV | 25.21 | 47.50 | 17.02 | 33.04 | 184.05 |
Poster | Speech & Person | Storing Groceries | Help Me Carry | GPSR | Stage 1 | |
---|---|---|---|---|---|---|
max attainable | 50.00 | 200.00 | 250.00 | 200.00 | 250.00 | 950.00 |
MODE | 37.86 | 0.00 | 0.00 | 0.00 | 0.00 | #N/A |
MAX | 41.43 | 101.00 | 30.00 | 30.00 | 25.50 | 209.36 |
MIN | 31.79 | 0.00 | 0.00 | 0.00 | 0.00 | 36.07 |
MEDIAN | 37.50 | 2.50 | 0.00 | 0.00 | 0.00 | 51.25 |
AVERAGE | 37.00 | 24.70 | 6.75 | 4.75 | 5.35 | 78.55 |
STDEV | 2.56 | 36.72 | 10.14 | 10.44 | 8.66 | 55.19 |
Open Challenge | Set a table / tidy up | Restaurant | EE-GPSR | Stage 2 | |
---|---|---|---|---|---|
max attainable | 250.00 | 390.00 | 285.00 | 250.00 | 2125.00 |
MODE | #N/A | 0.00 | 0.00 | 0.00 | #N/A |
MAX | 183.68 | 10.00 | 90.00 | 55.00 | 524.08 |
MIN | 0.00 | 0.00 | 0.00 | 0.00 | 107.29 |
MEDIAN | 139.58 | 0.00 | 5.00 | 10.00 | 257.73 |
AVERAGE | 114.51 | 4.00 | 20.00 | 20.00 | 275.33 |
STDEV | 72.79 | 5.48 | 39.21 | 24.24 | 165.85 |
Thanks @LoyVanBeek I've read through all of your material, and I think there is a lot that I agree with.
First off I agree with your thought that the overall performance going down "is due to the rulebook not being stable". In this regard, I believe we need to make 2018 rulebook an evolution of 2017's.
However, this goes in contrast with another issue that @LoyVanBeek has pointed out: "We want to test too much". It's very tempting to fix this issue by reducing the number of tests (I really don't want to take away repetitions). And reducing the number of tests implies test rework, which in itself implies major changes.
I propose to:
In regard of which tests to take away and which ones to keep and fine tune:
That is all for now.
As for themes: please stick with the current challenges as a theme for GPSR-like runs, to keep the rulebook stable.
Some other suggestions:
Long post @balkce! Thank you.
First I would call to attention to the fact that, again, discussions are mainly being held between the EC + @LoyVanBeek, @moriarty, and I. TC has nine members, and I would like to see them all reach consensus in #217, #308, #347, #352, #356, #358, #359, #361, and #362. I find Luca's and Sven's suggestion worthy.
Second, the rulebook's instability can't be considered as the predominant factor for the low performance, at least not in Stage I. Even less if we consider that the average performance is less than 10% of the maximum score. I would rather aim for an ill design of the tests because of the following (non quantitative) reasons:
Third. Some clarifications:
Thematic GPSR is pointless (not GPSR anymore) and this is more evident HRI during task acquisition (critical ability in SSPL) is bypassed. Better to have mini retro-tests kinda: go-get-it, who-is-who, and follow me. In fact, what you are suggesting seems to me like rolling back to those tests of 2011.
I don't think is wise to remove the Speech Recognition test, but upgrade it. From the beginning the plan was to move from predefined questions to NLP with Action Planning (the robot explains the plan, does not executes it). This far, I've seen very few robots with this ability, and those who have it barely use it due to the linearity of the tests (State-Machine like behavior).
Storing Groceries and Help-me-carry. One thing is describe the test in terms of non-sorted GPSR actions, the other give those actions to the robot; something that is unnatural and goes against #358 where @iocchi suggests breaking the linearity of the tests and just mark goals, easing the scoring. Those tests are atomic tasks that require a lot of local planning, and so must be kept, I can't imagine any mother explaining to you how to help aunt with the groceries.
Finally, since no counterargument is complete without a suggestion for improvement, here are my 2 cents:
Some examples
Till here my thoughts
No PDF reports, the tasks must be solved with visible results.
In my opinion pdf reports have the advantage for the referee to easily see if the visible result is actually based on robot perception or just a guess / something hardcoded.
I also thought the reports were a good way to see how other teams performed in a task that is more informative than just the score.
@johaq
In my opinion pdf reports have the advantage for the referee to easily see if the visible result is actually based on robot perception or just a guess / something hardcoded.
Please counterargument the following:
Considering your point, I wouldn't mind to check the PDF report if the robot did something, to check false positives and, as you say, hard-coding (which violates the fair play rule). If the robot didn't achieve a single goal, it won't score. Period.
The audience couldn't care less about PDF report.
Definitely true for the audience on location. #357 and #355 raise requests to be able to better judge the robots performance from outside which I think published task reports can provide.
PDF reports have the annoying side effect of robots doing only object recognition during a manipulation test.
If you do not want robots to do certain things in a task don't award points for it.
PDF reports have the annoying side effect of delaying score delivery. PDF reports make scoring harder
Having never scored a competition I don't know very much about this. Maybe a latex template that teams HAVE to use could make this easier.
If the robot didn't achieve a single goal, it won't score. Period.
Agreed. I think this would come with the more goal driven scoring suggested here.
October 30th CFP: Qualified Teams Announcement
Is this a fixed date? I haven't seen in the CFP any reference to it (for the SPL leagues the qualification material deadline is 15 Oct, OPL 5 Nov as per the CFP web).
I'm asking because teams that want to purchase a robot with a budget of year 2017 will find very helpful to have this information before 2018 (and as early as possible).
@awesomebytes
DSPL: October 7th (Call for proposals) SSPL: October 15th (Qualification materials) OPL: November 5th (Qualification materials) / November 26th (Qualification Announcement).
So far only OPL has all deadlines set. SPL have a more complex selection procedure, so I can't tell when they will announce something.
Closing due to:
Question in a nutshell
How much the rulebook will change this year?
Issue in a nutshell
First, define the deadline for the first draft (e.g. October 15, deadline for SSPL CFP).
If the rulebook is going to be changed and not just buffed, define
Introduction
Right after the RoboCup fever we got a lot of ideas, many of them very nice but that also involve MAJOR changes to the rulebook. The transition from 2016 to 2017 introduced several major changes from which the least relevant was the inclusion of DSPL and SSPL. Other major changes were the fusion of the Speech Recognition and Audio test with the Person Recognition test, the merge of Following & Guiding test with the Navigation test into the Help-me-carry test, the removal of the training and guiding phase in the Restaurant test, and the inclusion of the Set a table and clean it up test.
In the past, the rule was to introduce major changes to the rulebook every two years, but major changes have been introduced every year since 2015, so the Executive Committee and the Trustees advised to keep the rulebook as is and introduce only minor changes (buffs) to give teams to learn, prepare, and outmatch the test in 2018.
However, it seems to me that keeping the same Rulebook for 2018 not necessarily matches the spirit of all the changes suggested by the new TC members, some Execs, and even Trustees. Therefore, this decision must be taken once and for all, with a strong commitment of the TC of having all changes ready before the deadline.
Deadlines
At the time this issue was created the Call for Participation for the Social Standard Platform League has been issued with the deadline for the Qualification Materials' submission scheduled for October 15, 2017. The idea is that the other two leagues also follow this advance. The intention in a nutshell is that Qualified Teams are announced before the end of the year, making easier for teams to get funds and confirm their participation several months before the competition, while at the same time opening the opportunity for a second CFP if some qualified teams can't attend to the competition. In addition to this, and considering the large amount of applicants of 2017 (more than twice the capacity for OPL), a video demonstrating a Stage 2 test might be required for qualification to ensure only well-prepared teams. This shortens the time for releasing the rulebook.
As consequence, an early deadline for the first draft must be set, as early as middle of October of this year 2017. This requires strong commitment from the TC due to the necessary extra effort.
The following deadlines have been pre-set, and must be discussed and agreed:
It's a very tight schedule, yet doable.
Milestones and Features
During a meeting in Nagoya, the EC and TC agreed in that several milestones must be set, aiming to push robots and research toward actually solving very specific tasks such as: taking clothes out of the washing machine and folding them (laundry), remind an elder of their daily dose of insulin (nursery), make a sandwich (act as butler), give a tour in a museum (waiter), etc.
Such tasks involve certain specific abilities such as folding deformable objects, dealing with wet objects, manipulate tools, react to natural language queries, find people, keep track of the events and activities of a person, etc. None of them are solvable yet, and robots are not even close to achieve such feats, so milestones must be set and the test must be designed accordingly, having plans on what to change and when, and what to do if the tests are not being solved.
Suggested work plan
The suggested work plan is as follows.
Related issues:
362, #356, #353, #352, #351, #347, #308, #261, #179