RoboCupAtHome / RuleBook

Rulebook for RoboCup @Home 2024
https://robocupathome.github.io/RuleBook/
Other
147 stars 61 forks source link

Focus of @home #768

Closed johaq closed 1 year ago

johaq commented 2 years ago

Ah, an old classic. Here we go:

Setup

The @home website to this day states:

A set of benchmark tests is used to evaluate the robots’ abilities and performance in a realistic non-standardized home environment setting. [emphasis mine]

I believe this to be sort of the founding idea behind the @home league. Combined with the fact that the website further lists all sorts of different robot skills means to me that at the core @home is about integration:

Focus lies on the following domains but is not limited to: Human-Robot-Interaction and Cooperation, Navigation and Mapping in dynamic environments, Computer Vision and Object Recognition under natural light conditions, Object Manipulation, Adaptive Behaviors, Behavior Integration, Ambient Intelligence, Standardization and System Integration.

Winner of @home should be the team that has the most stable robot platform that best integrates a wide variety of skills.

Problems

Over the years I feel like there has been constant attempts to make @home about more. The two main things in my mind are:

On paper I don't disagree that @home should strife to be these things but I feel that a lot of introduced changes did close to nothing for achieving these goals while also chipping away at what I believe to be the core idea behind @home. For example, the decision to let teams chose the tests they perform reduces the number of robot skills a team needs to work on. Or, introducing YCB objects as object set takes away from the non standardized scenario. Furthermore I think it is important to realize that often these goals are not compatible and in some cases even diametrically opposed. Trying to sit between all chairs, the @home competition just falls flat.

Outlook

While ultimately my personal preference is for @home to be mainly focused on integration, I am fine with shifting the focus. BUT that has to come with the realization that choosing a new focus also has to mean way more drastic changes. I.e. if the focus is to be as attractive to research as possible we need to get rid of the non standardized scenario pretty much all together. If the focus is be as attractive to audiences as possible we need to at least introduce way more standardization and completely change the way we score tests (more incremental scoring, ideally automated).

I hope my ramblings were decently coherent and they spark some discussion below from the community. But I believe we really need an EC decision with clear direction on what the TC should focus the rulebook on.

alex-mitrevski commented 2 years ago

I'll just add some of my thoughts.

In my opinion, the core question you are asking here is: who is the league for / who is supposed to be attracted to participate in the league?

If the focus is solely on integration, then your target audience are above all companies (who are anyway dedicating most of their resources on integrating different functionalities) and student teams (who are learning how to work with a complex system). In the former case, the objective of the league is to test production-ready robots; in the latter case, the objective is to educate young roboticists.

In my opinion, focusing on integration only is likely to discourage researchers from participating. Particularly if you're a young researcher, integration is not an activity really worth doing; I'm not commenting on whether that's necessarily good or bad, but the fact is that a young researcher's game is investigating particular aspects of problems and producing publications about those; creating production-ready robots is not the objective. At the core of this lies the need to get funding: doing RoboCup takes up a lot of resources, so it can only be done if one is able to get funding for doing it. Getting research funding is, however, difficult, or dare I say impossible, if you just say "I'll just work on integrating stuff that's already out there."

For example, the decision to let teams chose the tests they perform reduces the number of robot skills a team needs to work on.

That is not a bad thing though, is it? This way, you get teams to excel in things that they are very good at without focusing on aspects they have no specific expertise in.

if the focus is to be as attractive to research as possible we need to get rid of the non standardized scenario pretty much all together

Not necessarily; this all depends on how the tasks are defined. Focused, but non-standardised scenarios are very useful for research - as long as the experimental protocol is clearly defined - because they allow components to be tested outside a lab.

Or, introducing YCB objects as object set takes away from the non standardized scenario.

Related to the previous two points, in my opinion, the objective of using YCB should be to let participating teams whose focus is not on object perception perform tasks where they can test components they are actually interested in.

If the focus is be as attractive to audiences as possible we need to at least introduce way more standardization and completely change the way we score tests (more incremental scoring, ideally automated).

I'm not sure why the objective should be to make the league attractive for audiences. @home is not an entertainment league like RoboCup soccer, but the point is to work on making robots do boring things that none of us want to do in our homes and that we would like to outsource to an autonomous robot; by definition, this means that the performed activities aren't really attractive. But that's OK, as long as the robots are very good at doing them.

Furthermore I think it is important to realize that often these goals are not compatible and in some cases even diametrically opposed. Trying to sit between all chairs, the @home competition just falls flat.

I fully agree with this. Whatever the objective of the league is, it has to be clearly defined so that those who consider participation clearly know what (not) to expect from the competition.

johaq commented 2 years ago

Thanks for your reply. Fully agree with the first part of your comment.

For example, the decision to let teams chose the tests they perform reduces the number of robot skills a team needs to work on.

That is not a bad thing though, is it? This way, you get teams to excel in things that they are very good at without focusing on aspects they have no specific expertise in.

No in theory it is not a bad thing but this is not what happens. We don't get teams that are specialized in NLP and show cool stuff, we get teams that are mad they could not show their cool NLP because their robot could not enter through the arena door. This is what I mean by half assing things. Yes, we reduced the amount of work on other skills that is necessary but it is still way too much for research groups with specific focuses.

if the focus is to be as attractive to research as possible we need to get rid of the non standardized scenario pretty much all together

Not necessarily; this all depends on how the tasks are defined. Focused, but non-standardised scenarios are very useful for research - as long as the experimental protocol is clearly defined - because they allow components to be tested outside a lab.

So i fully admit I'm no longer part of a robotics research group so my opinion might be outdated. But my experience is the fastest way to scare away researchers is by saying outside the lab application.

Or, introducing YCB objects as object set takes away from the non standardized scenario.

Related to the previous two points, in my opinion, the objective of using YCB should be to let participating teams whose focus is not on object perception perform tasks where they can test components they are actually interested in.

Again, a group solely focused on i.e. manipulation is still discouraged by having to implement a object recognition pipeline for standardized object. Why would such a group benchmark themselves in current Storing Groceries if they loose lots of points for missing object recogntion?

If the focus is be as attractive to audiences as possible we need to at least introduce way more standardization and completely change the way we score tests (more incremental scoring, ideally automated).

I'm not sure why the objective should be to make the league attractive for audiences. @home is not an entertainment league like RoboCup soccer, but the point is to work on making robots do boring things that none of us want to do in our homes and that we would like to outsource to an autonomous robot; by definition, this means that the performed activities aren't really attractive. But that's OK, as long as the robots are very good at doing them.

Good points imo. I believe becoming more attractive for audiences is a trustee directive, so we might not really have a choice here.

LoyVanBeek commented 2 years ago

Hi all, ex TC here.

The above describes some of the difficulties I also found when writing some bits of the rule book.

Furthermore I think it is important to realize that often these goals are not compatible and in some cases even diametrically opposed. Trying to sit between all chairs, the @home competition just falls flat.

I couldn't agree more.

Appeal

Concerning being appealing to the audience: this is often stated as a goal by the execs of the league. I see some value in this, as it brings in attention and a league that gets no attention is eventually dead in the water IMO. If promoting young people to get into technology is a goal of RoboCup in general, engaging them is a part of that. How that has to happen is open though.

Getting the robots out in the public at RoboCup, like with the Restaurant task or the old Follow Me and the Carry my Luggage task are great for audience appeal, I think, as the robot can get pretty close and personal almost to spectators outside the arena. I'd have loved this as a kid.

In the past, audience appeal has sometimes been shoehorned in, with eg. the Robot Zoo, that in the end mostly takes time and provides no value to the teams. Did bring a big crowd though, but a significant portion from other leagues.

Standardizing & Repeatability vs Realism

However, getting into the public is not something you can standardize: the people are different each time and eg, one robot can encounter a kid suddenly crossing in front of the robot, presenting a challenge not all robots experience making things unfair.

A key opposition of goals in @Home, IMO is fairness (via standardization & repeatability) vs realistic environments. If something becomes too standardized and repeatable, you can 'just' script for it, little/no reasoning and AI required, and do the same trick over and over again. That is unrealistic, so there must be some variation.

This was the reason for me introducing the 'average of best two out of three attempts' rule in scoring. That takes a lot of time but reduces the impact of singular bad luck but also of singular good luck, thus teams should strive for repeatable and robust behavior. Some variation between attempts at a task is also part of the game.

Making the variation part of the standard (eg. how far can we deviate from the standardized scenario) is very hard in my experience. And setting up the variation (e.g moving the couch and objects etc) within spec is at the very least time consuming.

An option could be to fully constrain the scenario for each test, eg. everything in an exact, predefined place, command given via typed text. And only vary the bits that the team is interested in given their focus. Less constraints = more points. But is very boring and a step back for @Home I think

Integration vs specialization

Concerning integration: realistically only a well integrated system can achieve anything in a normal home. Only being good at eg. speech recognition doesn't get you anything in @Home when you can't act on it and a great manipulator robot is useless if it cannot get a command.

Working around each shortcoming makes for a robot that always needs help, which is not satisfying for anyone involved IMO: not the audience, not the referees, not the team and not even the eg. speech recognition expert that made the robot understand the command but then has to see the robot fail to actually perform the command.

Having awards for eg. best speech recognizer, best manipulator, best navigator etc. besides the integrated challenges might help a bit, eg. counting instances where robot speech-recognized something correctly. But increased workload on the scarce referees and TC that have to actually grade the robots.

In some discussion about RoboCup@Home on Twitter, someone I respect as a robotics expert a couple of years ago said (parafrased from hazy memory): "RoboCup@Home has become more of a consumer of progress rather than a driver of it" and that kinda hurt me. But it kinda rings true too. I think it's a result of:

So is this a bad thing? I still haven't seen a real robot in any home doing really useful stuff I want done. I have seen parts and technologies for that though, sometimes making me think it should be possible to make a proper home robot, but it is not yet it seems. There is value in integration.

Conclusion

More rambling while I should be working... I guess team leaders and/or EC have to advise in this topic.

ARTenshi commented 2 years ago

Hi all,

very interesting discussion. We should also remember that we have two different leagues now, in my opinion, (S)DSPL and OPL, and the problems they present are very different -- (S)DSPL is mostly software development, and we have seen cases where robots break and teams are able to use another team's robot, while OPL focuses both in hardware and software (the scores are very revealing of these differences).

Making rules that challenge standard platform limits while encouraging open platform designs (e.g. objects on high shelves or two-hand object manipulation tasks) should be part of the discussion.

On the other hand, we can have a mix of very standardised tasks and general tasks and teams should perform at least one of each (it will also help to evaluate the state of the league more consistently).

Luis C.

johaq commented 2 years ago

I think Luis brings up a good point. We have different Leagues now so it might be time to utilize this. While Robozoo was, as Loy said, kind of a drag for OPL teams. It might be an interesting idea for SSPL.

ARTenshi commented 2 years ago

Also, I think ECs want more tasks with robots in the wild. Now we have Restaurant but, in the past, we had the elevator task and the supermarket task (we used trucks to move the robots to a nearby supermarket in Istanbul); we should start thinking about a few more of these tasks.

MatthijsBurgh commented 2 years ago

I suggest to start from the top and decide what the focus. That should resolve in skills and eventually challenges.

Having challenges in the wild seems fun. But it takes a lot of time. All robots need to leave at the same time. Depending on the situation the robots can be brought back directly after their run. But when it is really remote, you have to wait for all teams to finish. (While a team could be done after one run) Which can be easily an entire morning or more no testing for a team.

Also don't forget the required efforts onsite for the TC/OC.

So it seems fun, but really keep in mind the side effects.

swachsmu commented 2 years ago

Thanks for starting this discussion. I'd like to make some comments as a previous (old) EC. It seems that the discussion on focus reappears again and again. But this is not a bad thing. It helps reflect where the league stands and where it should go. @Home might not be a driver of new technology, but I see an extremely important role in making new technology and methods applicable to real world problems and accessible to larger communities. This includes integration, but also means to be at the forefront of research. There is a research high value in taking algorithms that performed extremely well on benchmarking datasets and make them run in @Home scenarios. We should follow the path that has been already startet and promote re-usable open-source solutions and exchange these between teams. This will also help new teams without dropping integration issues in the competition. I think, we can do better in this. I think most of the discussion is more related to good or bad tests. If we see that a test is only selected for an easy scoring of points, but the test is neither attractive for research nor attractive to the audience. Then, this test should be dropped. I agree that currently the tests are misbalanced and that more teams should do the same tests. If a team only wants to do HRI, that's also nice. They could win a certificate stating that they are great in this skill, but this team should not win the overall competition. We also should put more emphasis on the aspect to make the life of referees easier. Otherwise, we have a badly run competition and loose more teams than we win. I think the discussion should focus on which test we want to continue and have value for the competition and which tests can be dropped.

mleonetti commented 2 years ago

hi guys. I had bits and pieces of this discussion separately with many of you. I share Alex's vision above, and I made the funding point myself already several times.

However, I think there is a central question, which has somehow eluded this conversation, but is crucial for any research effort: How do you see RoboCup@Home make an impact in the world?

This includes not only "what" should change, but also "how" we change it.