awesomebytes commented 5 years ago

Hello,

I open this issue in a general fashion cause it applies to more than one test.

In Clean Up, there is a Bonus Reward for Opening the entrance door autonomously. The Main Goal is stated as: Find all misplaced objects in a room and bring them to their predefined locations..

My assumption is that for getting this Bonus Reward a participant needs to find and bring ALL 5 misplaced objects to be able to have the Bonus Reward applied.

If that is so, I can only interpret that the next Bonus Reward Moving a tiny object or Moving a heavy object apply the same rule (instead of earning that bonus reward because of having moved that specific object).

In Receptionist the Main Goal is to Introduce and allocate two newcomers in a party. With a Bonus Reward of Opening the entrance door to a guest (200pts each). By my previous interpretation if the door was opened once for the first guest it can only be scored as Bonus Reward when both guests are introduced and seated.

In Take Out the Garbage the same happens with the Main Goal being The robot takes out the trash bags from the two bins in the apartment. and the Bonus Reward of Removing the lid of a bin only will apply if taking out both garbage bag.

Also on Where is This? the Main Goal is Give accurate directions and guide at least 3 people. The Bonus Rewards state Provide directions to a naive operator 300pts (50pts each). and Provide audio recording and transcript of each interaction 600pts (100pts each).. By my previous interpretation, in the moment you get the 3rd person you unlock all those bonus points, otherwise, you don't get any.

Am I correct to interpret the rules in this strict way?

This may be happening in more tests, but these are the ones that this question came up.

Thank you very much.

johaq commented 5 years ago

No, partial scoring enables scoring of bonus points. There is the partial scoring remark:

Partial scoring: The main task allows partial (per guest) scoring.

I think this was an oversight and should be added for these tasks. I'll get on that.

kyordhel commented 5 years ago

@awesomebytes as a TC member you should be answering questions, not making them.

Clean Up. It's five objects or nothing, since you can handover the objects to the robot (reduction of 80%). Opening the door counts only if the robot cleans out these 5 objects. If you use DeM for the tiny/heavy, there is no bonus (i.e., you can't request the person to tell you where the object is).
Correct.
Correct.
Correct.

Why these tasks does not allow partial scoring is mostly due to prevent point farming (e.g. not solving the task and make lots of extra points). Trustees explicitly asked us to avoid this.

johaq commented 5 years ago

How is it point farming if the robot places an object in clean up but runs out of time for the others? We gonna give a robot that grasps and places 4 objects the same amount of points as for a robot that does not even enter the arena? I understand it is past the deadline but honestly this fell under the radar after german open because we scored with partial scoring and we all assumed this was how it was supposed to be scored. Taking out one garbage bag gave 250 points at GO we gonna give zero in Sydney?

kyordhel commented 5 years ago

Agreed, several tests require revision (a work that the TC did not do). We are on the same side here, a robot grasping 4 objects should not be scored the same as a robot that only enters the arena. But we also don't want a robot that enters the arena, picks an object and leaves. The same can be said about taking out the garbage: one bag should be scored in case of timeouts. The rest of the test require further discussion.

I would support such changes, but we are far too close to the competition, so I am not taking the responsibility of making such changes. The EC does expect, however, COMMITMENT FROM THE WHOLE TC in reviewing this (all current 5 members), and reach consensus. @balkce and @swachsmu will most probably agree to the change then.

johaq commented 5 years ago

Ok. Imo this is pretty important so I am in favor of this change.

swachsmu commented 5 years ago

In many of the tasks (e.g. clean-up) the main goal is described as getting 5x 100 points. Thus, it is natural to assume that, if you place one object you are getting 100 points. Otherwise, there would be "placing 5 objects is 500 points". Thus, completing the main goal for one object should be fine without any change in the rulebook. A bonus that you can get only once should only be scored if the complete task is done. This is the way, I would suggest to interpret the rulebook. It will be essential that there is more positive scoring in Sydney than in the German Open. Teams will be better, but if some main goals are achieved (like placing an object or guide a person to her cab) they already should get scores. Otherwise, you will frequently run in zero score performances. This is not in the interest of the league.

moriarty commented 5 years ago

This is gonna be difficult to write as a rule but I agree with the concept, if a robot is clearly attempting the entire task, and runs out of time it should be worth more than a robot which is just driving in and only attempting part of the main goal.

SJ-YI commented 5 years ago

This is gonna be difficult to write as a rule but I agree with the concept, if a robot is clearly attempting the entire task, and runs out of time it should be worth more than a robot which is just driving in and only attempting part of the main goal.

It won't be that difficult. Just add more partial scores. For the garbage task as an example, I'd give 50 points for approaching each trash bin, and 100 points for picking up each trashbag. Current all-or-nothing rule will only favor teams aiming at low-hanging fruit.

Let's assume two teams with different strategies:

Conservative strategy: Let human hand over the bag (-400pts). Robot just moves the bag to the designated zone (500pts). Total score 100pts.

Ambitious strategy: Attempt autonomous bin opening and bag pickup. If everything works well, will get 500+200=700pts. If robot successfully opens the trash bin and picks up the bag but accidentally drops the bag during the transport, the team will get ZERO point. Which is absurd.

awesomebytes commented 5 years ago

As in adding more partial scoring, I don't think the current tests need more. The discussion actually went over applying partial scoring on the tests that seem to have it already (as in take out the garbage: 250pts 2x [per bag], Clean up: 100pts x5 [per object], for example).

In all fairness, if the robot drops the bag, but notices it and asks for help to get it back, some Deus Ex Machina will apply on the score (-X% I'm on mobile, cant check the rulebook right now, I think it was -30%, I'll use this number for reference), but the team will still get the score for that bag being transported to the bin (250pts per bag, if dropped and helped to pickup again, 250-30%(75)=175pts).

Also note that while using the handover which gives -200pts on that bag, the bag could also still fall on the way, and the robot may still need to ask for help triggering a DEM rule and also getting that -X% points less (on that case, 250-200=50 scoreable points, -30%(15) = 35pts).

Or so I understand the rules. There is a explicit handover DEM penalty in take out the garbage because the test needs that pick up of the bag to happen. If the robot needs extra help later on cause things go wrong (but the robot detected it autonomously and asks for help in a smart way) the general DEM rules apply. I say this cause you can also opt for intermediate strategies, and the robot needs to have fallbacks in case things go wrong. If the robot tries to take off the lid and it fails, it will need to detect it and ask for the lid to be taken off from him (which there won't be any lowering on the score there I'd say as in any case thats just for a bonus and you are penaltied already on the time you are losing I'd say. But that's also my opinion, maybe someone has a different view).

justinhart commented 5 years ago

TL;DR: Interpret the rulebook the way that Sven recommends, which is what I thought we were doing anyway. Accept that the rules will be a little rocky.

Develop a plan to make them more balanced in the future.

Long version:

I agree with Sven's recommendation and thought that that was the intention when we wrote the rulebook. This is actually exactly how I read the rulebook during every step of this process.

I have a recommendation that I was going to recommend during the TC meeting at the end of RoboCup. I'm going to outline it now, because it is relevant here.

Every year the TC hits a lull right after RoboCup, but I'd like to look at how video games and board games handle their rules. There are a LOT of different mechanics at play in RoboCup@Home, and coming up with a balanced rulebook is an impossible task that will always leave people dissatisfied. I think that we really need to look at our motivations in running this league for guidance. There is a motivation to engage in friendly competition (emphasis friendly, because it gets a little more cut-throat when we're all in the heat of the moment), and there's a motivation to motivate and drive our research. These are both directives of this league.

Here's what I propose. In the months after RoboCup, before we're really looking at the new rulebook, we attempt to re-balance the rules. It should be the case that basically each task is worth the same amount of points in practice, not just how we wrote it in the rulebook and assumed it would go.

So we take the scores. We take the mean performance and the top performance, and we re-balance the scores based on a combination of these factors. The top-performing team should have room for something like a 25% improvement in score next year, based on a harder version of the rules. The average team should expect to get 50% of the points. So, it becomes like grading on a curve. This means that people who make advancements over the year are at an advantage and can win, and it discourages people from point-farming, which firmly believe happens because teams throw their hands up in the air and go, "Well, this round is pointlessly hard. Let's get what we can for showing up."

Since we would be re-balancing every year, that means that if teams find a bug and exploit it, we make the round harder. If the round is too hard, we make it easier to encourage good-faith efforts to solving the task. Bye bye people just opening the door in Store Groceries.

I think that this works even though some tasks are notionally more difficult than others.

Now, here's the real kicker. This re-balancing should move us towards long-term goals. Giving the teams something to reach for every year will make it so in a few years, we're looking at robots frying up hash browns and making beds, or at least serving drinks at something that looks like a real party. So we can start to take each round, one by one, it doesn't have to be a big painful thing

justinhart commented 5 years ago

I agree that adding more partial scoring rules at this juncture would be the wrong move. I think that we should take notes at the competition as to what could be considered for partial credit, and then use this data in a data-driven re-balance of the rules every year, as I propose above.

I also think that we should simply interpret the rules as Sven proposed, which is what I thought that we were doing anyway and is a good idea.

SJ-YI commented 5 years ago

As in adding more partial scoring, I don't think the current tests need more. The discussion actually went over applying partial scoring on the tests that seem to have it already (as in take out the garbage: 250pts 2x [per bag], Clean up: 100pts x5 [per object], for example).

I don't think those tasks have partial scoring. Cleaning up tasks gives 100 points when "Placing an object at the appropriate location" is accomplished. Let's assume that team A successfully picks up an objects. Still, they will get ZERO points if they place the object to wrong location. If they give up that part and ask for human guidance, they will get some (nonzero) points. So giving up can be better than trying something.

In all fairness, if the robot drops the bag, but notices it and asks for help to get it back, some Deus Ex Machina will apply on the score (-X% I'm on mobile, cant check the rulebook right now, I think it was -30%, I'll use this number for reference), but the team will still get the score for that bag being transported to the bin (250pts per bag, if dropped and helped to pickup again, 250-30%(75)=175pts).

The single Dues ex machina rule for garbage task is manual handover with 200pts deduction per bag. So if the robot drops the bag and asks for the help, the team will be given exactly same points (50pts per bag) as the team that doesn't even try picking up bags.

So in both cases, a team that tries something hard (and fails in midway) can get the same or even lower points compared to another team which doesn't even try. That will only encourage point farming.

awesomebytes commented 5 years ago

I don't think those tasks have partial scoring. Cleaning up tasks gives 100 points when "Placing an object at the appropriate location" is accomplished. Let's assume that team A successfully picks up an objects. Still, they will get ZERO points if they place the object to wrong location. If they give up that part and ask for human guidance, they will get some (nonzero) points. So giving up can be better than trying something.

If you place the object in the wrong location with help you also score 0. If you need help, you need to make it clear. It's not like you can just say 'hey random human that I don't even know if you exist, pick the object and leave it in the correct place in another room meanwhile I'm here doing nothing'. The robot will need to check that the object was picked and that the object was placed in the correct place. That will take as much or more time than actually picking and placing it (I think).

In the spirit of my example the DEM (Section 3.8 of the rulebook) rules state:

Request help: The robot must indicate loud and clear that it requires human assistance.
It must be clearly stated:
• The nature of the assistance
• The particular goal or desired result
• How the action must be carried out (when necessary)
• Details about how to interact with the robot (when necessary)
2. Supervise: The robot must be aware of the human’s actions, being able to tell when the
requested action has been completed, as well as guiding the human assistant (if necessary)
during the process.
3. Acknowledge: The robot must politely thank the human for the assistance provided.

For example in Clean Up, for one object, the robot must go to a location where there can be objects, scan for them and detect them/it, recognize an object that's not where it is supposed to be, pick it up, navigate to where it is supposed to be, place it there. That's a full set of things to accomplish, this years rulebook was about getting robots to achieve main goals, not subparts of it (as that was even easier to farm points in previous rulebooks. We would provide score for recognizing an object straight away for example). You will score 100pts for one object if you do all that. If you decide to ask for help to a human to physically interact with the object you'll get -60pts. So the maximum you can score is 40. So you need to do the rest of the things. If you use more DEM rules, you'll reduce your score even more, for example, if you needed to use Pointing to the object to be moved you'll get -40. So you could accomplish the main goal scoring 0. Which feels like an unfair 0, as the robot did more than... nothing. But, if you can't do any of that... maybe choose a different test to participate?

The single Dues ex machina rule for garbage task is manual handover with 200pts deduction per bag. So if the robot drops the bag and asks for the help, the team will be given exactly same points (50pts per bag) as the team that doesn't even try picking up bags.

As a TC I don't interpret it that way. If the robot picks autonomously the garbage bag, but on the way to the disposal point the robot drops the bag and asks for help to pick it up again (it may have fallen in a way/place the robot just can't pick it I understand) I'd apply only the general DEM of -10% to -30% scoring in that bag.

Section 3.8.2 says:

Partial execution: A reduction of 10% of the maximum attainable score is applied when
the robot request a partial solution (e.g. pointing to the person the robot is looking for
or placing an object within grasping distance). The referee decides whether the requested
action is simple enough to corresponds to a partial execution or not.
2. Full awareness: A reduction of 20% of the maximum attainable score is applied when the
robot is able to track and supervise activity, detecting possible, and when the requested
action has been completed.
3. No awareness: A reduction of 30% of the maximum attainable score is applied when the
robot has to be told when the requested action has been completed.

I think this way cause the robot did the pickup and (in this example) it does the dropping in the correct place. But in the process something happened, if the robot would have fixed it by itself (pick it up again), there would be no score reduction, but as the robot used some help (asked someone to pick it back up into the robot gripper), there will be some reduction.

So if you try to pick up the trashbag and fail, you can score the same than someone that doesn't even try by asking for help. But if you pick it up, you will score way more than someone that doesn't even try (on a single bag you are scoring more than someone doing both bags, so you could for example take your time to do this correctly by triple checking you grasped the bag, the bag is in your gripper, and it stays on the gripper). And if on the way of dropping the bag to the correct place something happens, any of the two approaches can ask for help to pick it up again just triggering the standard DEM rules.

kyordhel commented 5 years ago

Stay on topic

This issue addresses whether to allow (or not) partial scoring, and how to allow it fairly. Tests and scoredheets won't be modified beyond some minor clarifications.

I find valuable @justinhart's comments on how to evaluate and improve the rulebook. This is so important it deserves its own thread

No fine-grain scoring

Trustees explicitly requested robot to focus on executing complete tasks. Both audience and sponsors are interested on robots solving entire household tasks autonomously, not just parts of it.

RoboCup@Home has become (in)famous for babysitting teams and rewarding attempts, not results. In Soccer you win by scoring goals regardless of how many steps your robots gave, in Rescue you score by clearing a path and providing assistance with no intermediate point. Why robots in @Home should score for grasping? The only justification is benchmarking, and ours shown that most "basic functionalities" are problem solved.

Fine-grain scoring is not happening again. It was tested between 2014 and 2017 and produced a severe degradation in performance. This is documented in peer-reviewed scientific literature. @SJ-YI please don't insist on this topic.

SJ-YI commented 5 years ago

If we want to pursue no fine-grain scoring policy, logically we should abolish all human assistance rules as well. For now we can get lower score for partially accomplishing a task (i.e. garbage transport after human handover) if we give up that part before actual trial. Which means each teams still have choice.

In soccer terms, the equivalent scenario is letting robot do free kick (instead of full soccer) with 0.5 score per goal. This sounds just as absurd as giving 0.2 score whenever the robot kicks the ball.

justinhart commented 5 years ago

It's not that we don't want to discuss this at all. It's that the rulebook is supposedly frozen and implementation of what you are suggesting is the sort of thing that happens in the year leading up to the competition, not the week before.

If there is something that is crucially broken, we should change it at this juncture. There is already a system of bonuses and penalties in place designed to capture exactly what you are driving at here. We can tweak it all next year, but at this point the only changes that should be made are ones that will make or break the competition.

justinhart commented 5 years ago

Did we as the TC agree on @swachsmu's interpretation, wherein if 5 objects = 1000 points, then 1 object = 200 points?

kyordhel commented 5 years ago

wherein if 5 objects = 1000 points, then 1 object = 200 points?

5 objects = 500 → 1 object = 100 points

RoboCupAtHome / RuleBook

Bonus reward on main goal doubts #664

Stay on topic

No fine-grain scoring