Closed juandpenan closed 1 month ago
I am not sure we should introduce such subjective scoring.
When we do this, we should have the same set of people to score all teams.
So on the hand I'm very hesitant about introducing this kind of subjective scoring on the other hand I do agree that often the score does not feel reflective of a robot's performance. Maybe we could try this evaluation just as a test in Eindhoven to get a feel for how the scoring looks. Though we might not get an accurate picture because teams won't adapt their approach if they don't actually get scored on it.
I like the general idea but scoring this is very subjective, but culture and the environment play a huge role in what we perceive as polite and what not, so giving points for this seems really difficult.
I do agree that the robot's cognitive tasks are currently not really scored at all and would like to change that, but am unsure on how this could be done. I've seen other competitions have presentations about their approaches, which we kind of have in the poster sessions, so maybe cognitive//knowledge tools could be evaluated there?
There could also be new challenges implemented to evaluate these aspects in particular.
Is your idea/suggestion related to a problem? Please describe.
The current scoring system in RoboCup@Home predominantly rewards performance based on the achievement of sub-tasks. It doesn't adequately assess essential qualities related to human-robot interaction and robot intelligence. Key aspects such as perception, behavior adaptation, prediction of behaviors, and cognition (information processing) remain under-evaluated.
The existing scoring system can be stringent. Points are awarded to teams only upon the successful completion of the specified sub-task in the scoring sheet. This approach often overlooks partial achievements in tasks, meaning teams rarely get recognition or partial scores for their efforts or near-successes.
Describe the solution you'd like
I propose introducing a qualitative metric score to the competition, which will be derived from a survey inspired by previous research. This survey consists of 17 questions, assessing the perceived social intelligence of the robot's performance. A preliminary version of the survey is available here.
At the beginning of the survey, participants will indicate the team and task they are evaluating. Each competing team will be responsible to gather at least five individuals to complete the survey. These individuals must be presented to the referee prior to each test. Both referees and volunteers are permitted to participate in the survey.
The survey scores, which range from 16 to 80, are automatically compiled in this spreadsheet. The intent is to sum up these scores with the overall competition scores. To maintain the authenticity of responses, the survey will only be accessible during the task periods.
Describe alternatives you've considered
Pilot Test: Given the challenges of implementing this new system across all tasks, I considered introducing it solely for the GPSR task as an initial pilot.
Single Evaluator Approach: Another alternative would be to assign just one volunteer to fill out the form for each team. While this might reduce the statistical validity of the feedback, it would still provide some qualitative insights, which is the primary goal of this proposal.