People layout in Speech & Person Recognition

kyordhel commented 7 years ago

In #216 the problem of the people setup in the Speech and Person Recognition test was addressed but no consensus was reached.

@komeisugiura, @balkce, if I understsnd this it right, robots can know the speaker is to the Left or to the Right, at least in the front line. What happens behind the robot?

The following 4 setups were proposed (See Figure). spr

[ ] A: Front, Back, Left, Right, ~90° each.
[ ] B: Three on the front line.
[ ] C: Three on the front line, two on the back on each side.
[ ] D: One in front, two behind, ~120° each.
[ ] E: Two in front, one behind, ~120° each (D with robot turned 180°).

Voting is as follows

@balkce voted:
- OPL: A | C
- DSPL: B
- SSPL: B
@komeisugiura voted
- OPL: C
- DSPL: B
- SSPL: B & C

If no further remarks are given, I will setup a PR with this specification in the Appendix (renamed) with C for OPL, B for DSPL and B & C for SSPL

LoyVanBeek commented 7 years ago

My preference is to keep things all the same for all leagues, so a proper comparison between robots can be made. E.g. if I want to buy a home robot, I will look at the scoresheets and decide which to buy (hypothetically, alas). I want to see, from this test, that a Pepper or HSR does worse at this test than an OPL does, actually. From that, I say A, C for all robots.

komeisugiura commented 7 years ago

@kyordhel My point is that DSPL robots cannot distinguish front/back due to the layout of microphones. This is hardware limitation. So, we should avoid C for DSPL.

On the othe hand, OPL robots can solve A-D if the layout of its microphone array is appropriate.

kyordhel commented 7 years ago

I would like to avoid giving my opinion as I am looking for a fair consensus among experts on the topic (The hardest T of the TC). In this sense, the pull request has been set as C[ircular] for OPL and SSPL (Pepper), and B for DSPL (HSR).

Now, for the years to come. The optimal setup for sound source localization would be the union A U B (i.e. 8 quadrants). But that's something irrelevant for HRI for most people can't do that. Instead, we turn and try to localize the person until we see someone speaking or we are facing the source of sound (like trying to guess in which room is the baby crying). This goal must be defined and, I think, HSR robots must be able to accomplish it somehow.

balkce commented 7 years ago

To be clear, HSR could theoretically carry out C, but it would involve some deep-learning hardcore algorithm. The ones that I know of are still not adequate for competition since it involves training on site with perfect positioning and really cumbersome calibration. So, for now, the solution is to provide two scenarios (pretty much the same way as we are considering height limitations for manipulation).

However, as more sophisticated algorithms become easier to code, I would like for this platform-dependent scenario to go away and have one scenario for all platforms. I believe that in a couple of years, this would be possible.

balkce commented 7 years ago

And, by the way, @kyordhel, humans can definitely do AuB. The whole of field of sound simulation for video games, like Call of Duty, wouldn't work as well if humans couldn't do AuB.

We may not be able to estimate the sound source direction with an error less than 1 degree, but we can definitely know in which of those 8 quadrants the sound source is located. And making the robot turn towards the user I believe is relevant to HRI.

LoyVanBeek commented 7 years ago

In any case, when a platform dictates another setup, the score for the test should in my opinion change accordingly.

balkce commented 7 years ago

@Loy 's point about scoring would be an issue if all the robots would be ranked in the same list. But since they are not, I don't see the problem. The best in SPR in HSR is different from the best in SPR in OPL.

Besides, if we do go the route of different scoring per scenario, one can make the argument that because the scenario is easier, it deserves less points. However, another can make the other argument that because is much more difficult to do SSL with that robot, it deserves more points. I really don't want to jump into that rabbit hole.

kyordhel commented 7 years ago

@balkce Thanks dude! Now I know I'm also deaf!

@LoyVanBeek I'm against changing score. As with SSPL adapted tests, all mods should lead to the same sum. In my opinion, if a robot is half deaf (like me) it should have an outstanding vision system (not like me), or some equivalent.

Therefore, if HSR can't perform the sound source localization, it must then aim for more -Person recognition:

It gives more information about the crew.
- Sitting people
- Standing people
- Lying people
It can point to the operator within the crew.
- Natural Language Processing
More complex questions
Problem solving (e.g. task explanation).
Anaphora/Cataphora resolution
Ellipsis resolution

LoyVanBeek commented 7 years ago

If the task is simpler, then I think the score should change. If we don't want the score to change, the the task should not be simpler.

Especially when https://github.com/RoboCupAtHome/RuleBook/issues/259 hits, then the minimum score for each league should incorporate the relative difficulty.

kyordhel commented 7 years ago

Indeed @LoyVanBeek, and this is why the alternatives for a SPL test mus be at least as challenging as those of the rest of @Home.

As the deadline for the final version of the RuleBook approaches, and considering that pointing at a person in line with others (B) is simpler than when people is arranged in circle (C); i would like to propose the following (from 2016's Person Recognition Test):

DSPL Only

For i = 1..3
- Operator i introduces themself to the robot (understand name on 1st attempt 5pts).
Robot inspects the crowd
For each operator
- Robot points at each found operator stating the operator's name (5pts), gender (5pts), and age (10pts).
Robot proceeds to Riddle Game
Robot answers 10 questions (instead of 5).

LoyVanBeek commented 7 years ago

However, this is a quite different test, this late to the competition.

Another option is to also give all leagues, including OPL, situation B.

kyordhel commented 7 years ago

No, is not that late. This mod is within the spirit of the same test and uses exactly the same features of the original. Plus, is only for DSPL. Is this mod, or DSPL will have an advantage over the other Leagues.

Coding a state machine for asking 3 names and pointing at recognized people in a crowd shouldn't take them more than half an hour.

About giving all teams just situation B... Nope. Big no. Speak your mind @balkce

balkce commented 7 years ago

Applying B to OPL is a step back from last year, so it's a no from me on that regard.

IMO, @kyordhel's idea is in the spirit, but I agree with @LoyVanBeek: it's too late to bring such a big change. Even if it's a simple state machine code, we cannot assume how each team is going to solve the test, and even if it was that simple for everybody, it's not the coding that gets you, it's the testing if you coded it correctly, which takes its time as well. I'm imagining quite a lot of teams already have this test coded and tested; bringing this change is a slap in the face for DPSL teams.

In addition, there is a considerable difference for some teams between "pointing a person" and just stating "there are two males in the crowd".

If the difference of points-to-difficulty is the problem, we do have to remember that the trophies/awards will be given separately per each league and there may be no trophy given to an overall winner. Thus, the difference of points between leagues may not be an issue.

In any case, if you want my take on a solution: in the blind games bluff for DSPL, reduce the points of "turned towards person" to 5 and make the robot answer 10 questions instead of 5. The points are the same, but DPSL requires to answer more questions about the crowd with simple turns to gain the same points as OPL with more difficult turns.

LoyVanBeek commented 7 years ago

+1 for that solution.

RoboCupAtHome / RuleBook

People layout in Speech & Person Recognition #289