ASR/NLP/NLU Roadmap - Githubissues

LoyVanBeek commented 7 years ago

Devise a multi-year roadmap for Automatic Speech Recognition, Natural Language Processing + Understanding for service robots

[x] 'Hear' short, single words E.g. "Yes" or "No"
[x] Hear and understand short, fully specified commands e.g. "Go to the kitchen"
[ ] Accept commands initiated by the user, not the robot asking for commands (less scripted scenarios)
[ ] Understand short, under-specified commands, e.g. "Go to a person"
[ ] Ask for more information in under-specified commands, e.g. "Go to a person" --> "Which person?"
[ ] Understand long commands
[ ] Coreference resolution, e.g. "The cookies are in the kitchen, bring me one"
[ ] Coreference resolution "Where are the cookies?" --> "In the cookie jar" --> "Bring it to me"
[ ] Information retrieval: Answering questions about (direct) environment
[ ] Information retrieval: answering general knowledge questions
[ ] Conversation initiated by human instead of robot
[ ] Robot can be called from a different room
[ ] Recognize speaker by voice (accept command s only from owner)
[ ] Differentiate between different speakers
[ ] Determine mood from voice
[ ] Be able to function in a noisy environment (e.g. a bar, party, etc)
[ ] ...

Items are checked when a robot has shown this skill during a RoboCup@Home event

LoyVanBeek commented 7 years ago

The reverse is also interesting: make the robots more eloquent and generate text. We had a little of this in the RoboNurse challenge a few years ago when the robots had to describe to Granny which bottles of pills were on the shelf.

kyordhel commented 7 years ago

Both sound nice to me, but it will take time (test is done 3 times, it needs to be short). Also, I'm against combining SSL & NLP while robots show poor performance in both.

LoyVanBeek commented 7 years ago

This is not something we should do right now, but progression to make over some time: next year and further.

kyordhel commented 7 years ago

In fact, we may incorporate incomplete information and coreference resolution (specifically anaphora) as it has been done since 2012 in EGPSR by changing the number of questions from 8/8 to 5/5/5 (ASR, SSL, NLP). It is not a crazy idea since is being requested for this year's GPSR.

Also, this roadmap (along others) should be defined by Trustees, or at least the EC, since they stay in charge longer than any TC.

komeisugiura commented 7 years ago

From the viewpoint of robot audition, I suggest to add "calling a robot from another room", which is interesting and useful.

Smartphones or Echo-type devices will be important (non-onboard) audio input devices for a robot in the near future. However, it is unlikely people stop talking directly to a robot in a ten-year scale.

Therefore, both onboard and non-onboard microphones will be required. Not one of them. The non-onboard part can be covered by distant speech recognition technology, and onboard part can be covered by robot-audition technology.

balkce commented 7 years ago

In addition to Komei's suggestions, I would add to have the robot be able to differentiate between speakers (is my master talking to me or is it the guest?).

Also, since the robot may be working in noisy environments such as a restaurant or a bar where the "noise" is actually other users acting as interferences, multiple-user ASR should be on the road map. Two persons asking for its attention at the same time, two rude persons talking over each other wanting to place the order first, or even taking the order from one person in one table while being near another table with two other people having a conversation relevant to the robot ("boy, this robot is taking a long while to take our order").

Finally, recognizing mood from the persons voice (satisfied, annoyed, panic) can be useful. For example, in conjunction with Komei's suggestion of attending from another room, if the person calling out has panic in their voice, we could establish an emergency situation.

LoyVanBeek commented 7 years ago

I would like to assign years to each roadmap item and put in the rulebook as an appendix.

This way, teams can anticipate new skills to develop even when the rulebook is not yet finished.

kyordhel commented 7 years ago

I think Issues are insufficient for that. My idea was to setup the Wiki of the repo and add all new features, with their roadmaps, goals, and explanations there, so future TC's can work over them and follow the line.

However, we need assessment from EC for that, as I really don't know where they are aiming for now.

LoyVanBeek commented 7 years ago

A wiki editable by others does make more sense than an issue. There is also the RoboCup@Home wiki it's not quite up to date

kyordhel commented 7 years ago

@airglow, @awesomebytes, @HideakiNagano21, @justinhart, @mnegretev, and rest of TC: your thoughts, comments and proposals, please.

justinhart commented 7 years ago

I really like this roadmap, actually, but I think that there is a problem with how it is implemented.

Basically, since all of the instructions are generated by a grammar, one can simply parse that grammar to arrive at the instruction that you want the robot to execute, and then place the instruction at the leaf node.

In principal, it makes sense to have a grammar which provides the rough format of what to expect, and it makes sense to restrict what the judges are allowed to ask the robot to do, but since the grammar published online is the same as used in competition, you don't really need state-of-the-art NLU software to perform the task. In fact, doing so would make one's team less competitive. We changed our solution from using in-house NLU software to using a custom thing based around the grammar distributed for RoboCup@Home in response to this.

What I'd like to see is something that tells the judges what to ask the robot to do, but forces the judges to ask the robot in their own words. Even if what the judges end up saying is similar to what is in the original grammar, it would force teams out of building solutions that conform directly to parsing the provided grammar.

balkce commented 7 years ago

What @justinhart is proposing echoes a lot of what we want a service robot to behave (in many levels of the organization). However, the issue has been the "forcing" part of it, since many teams don't have NLU as part of their own roadmap and communicating to the robot what the user wants is an essential part of the task. If the robot didn't understand the command correctly, the rest of the task is ruined, which is very unfortunate if the main goal of the test is to show off other functionalities that aren't NLU related.

To this effect, what I think could be done in short-term is to add NLU as part of the challenges in EEGPSR. We already have different functionalities that are tested in there (people-related, object-related, etc., in the context of #327). We could one more category as "Natural Language Understanding" or a modifier (much like "command given by a TC member or a team member") that gives different points if the command is provided with the given grammar or via natural language.

kyordhel commented 7 years ago

Regarding NLP, I'm completely against grammars (even though I'm the author of the generator). For those not interested in the topic, I think a more fine grained roadmap must be defined, but always having a backup plan (i.e. alternate interfaces) for those teams whose are not targeting NLP.

I think the only way to achieve true NLP is to step out of the equation and let the audience to command the robots the best they can.

Elaborating.

First, most involved people hasn't English as mother language. This makes extremely difficult for us to provide a good grammar on how commands should be. And even though we were all native English speakers, I think a group conformed by an Australian, Canadian, Irish, Londoner, Texan, New Yorker, and Scotch will have a hard time deciding what's a natural way to command a robot. Yet, for a foreign perspective, all are valid ways. And that leaving behind the disjoint phonetic group of each idiom.

Second. Leaving aside Locales, but stretching to spoken commands only, natural languages are infinite and there are hundreds of ways to ask for the same thing. My research shows that many people likes better single-word imperatives heavily dependent on context, while others make use of sentences or even detailed explanations. However, the most expects the robot to guess what it is required for it has been designed for it.

Third. This time leaving aside foreigners (we used a standardized and often deprecated subset), you can't efficiently code a system for a language that is evolving constantly, unless adding on-line learning. Even more, youngsters don't use the same constructs of elders, and new lexicon is incorporated every day, as well as some rules considered as "the good way" are replaced by "errors". Even some languages incorporate elements lacking of double articulation, but that leads directly to...

Fourth. Yet again research shows that, when commanding people, we use to mix speech with gestures, mimics, whistles, snorts, and a broad set of non structured elements of language. Sometimes we even act an example and expect repetition, being all the language corporal and hardly translatable to speech. This is something no ASR can deal with.

Fifth, and most important. We are biased. We deal with robots/computers almost every day, so we unconsciously know what a robot can do and what it can't, so all our instructions are biased by experience in the field and experience in the competition. The logical conclusion is that we are the most unqualified people to guess how a robot should be operated by a final user.

Finally, I think is better for the final user to instruct a robot when it knows the robot capacities. Therefore, in parallel we should ask people to command robots, and build the roadmap oriented to solve daily tasks such as cooking, cleaning the toilet, mopping the floor, doing the bed, taking out the garbage, etc.

johaq commented 7 years ago

I agree with the general sentiment towards NLP. I think one thing that should be kept in mind though is that a grammar guarantees some kind of fairness and consistency. Picking audience members randomly to give commands can mean a lot of variance in difficulty for different teams in different runs. One solution might be to allow the robot to introduce himself to an operator so it is on the teams to give a concise summary of the robots capabilities and to influence the way people communicate with their robot in a way that is suitable for them.

Edit: Maybe to make this a little clearer. My point is that imo NLP is a skill that is very difficult to assess fairly in a short amount of time. Is someone aware of other robocup leagues or even other competitions that have NLP challenges?

kyordhel commented 7 years ago

@johaq I think fixed grammar is for now the best alternative (although wrong way). Allow the robot to introduce itself might prove too time consuming.

justinhart commented 7 years ago

I like the idea of hacking some better NLU into EEGPSR, or maybe as a more challenging mode of SPR, similar to how you can pick more challenging modes of EEGPSR. If we figure out how to do this fairly for those rounds, then maybe we can expand this in the future.

kyordhel commented 6 years ago

Whether I present this at the symposium or not, I think we should still discuss this proposal (see attachment).

From Commands to Goal-based Dialogs - A Roadmap to Achieve Natural Language Interaction in RoboCup@Home.pdf

justinhart commented 6 years ago

I agree. This is good stuff.

RoboCupAtHome / RuleBook

ASR/NLP/NLU Roadmap #217