Open LeanderVonSeelstrang opened 1 month ago
Effectiveness: The Social Survey measures something important that is otherwise not measured by the rules and regulations. Practicality and efficiency: In its current form, we have not been able to consistently find enough respondents to complete the questionnaires. Viewers were actively approached and could either scan a QR to the questionnaire or were handed a tablet. Compared to other tasks, the evaluation was very demanding. Fairness: By its nature, the information requested is subjective, which can lead to high interpersonal variance. The evaluation period is too long to collect feedback from enough people, but too short to evaluate the performance representatively. Non-existent ratings for some teams, despite potentially good performance, would put these teams at a disadvantage. Achieves the goal: teams will probably not invest more development time in a more likeable robot than they would already do. The maximum additional points achievable through a good performance in the questionnaire are too low for this. Apart from the scoring, there is currently no plan for further use of the collected questionnaire data.
Could we perhaps run the evaluation or social survey as a permanent task throughout the competition?
Good breakdown of the problem. Do you have a suggestion?
My thoughts:
It either needs to be changed to address the issues you raised, or removed. Given the fact that this year the net result was that it didn't effect scoring due to missing data points I would lean towards removing it unless a good alternative is suggested.
The variance between the respondents' statements is presumably reasonably small. If it is still important to include this type of information in the evaluation, the two referees could create the social score independently of each other using the questionnaire. This would make it easier to guarantee the completeness of the answers. This approach has two disadvantages: it increases the workload for the referee. It is also less representative and remains subjective; discussions with the teams are guaranteed.
In order to create relevance, the points would have to be increased. Regardless of this, point farming can be counteracted simply by cleverly rebalancing the other scores. It is easy to farm points there -> it must be less relevant in the overall context, but must be relevant within the task. In concrete terms, this probably means that, among other things, manipulation in other tasks must yield proportionally more points.
I think the combination “the referees carry out the survey + the result is more relevant within the task” could work. However, since I also think that the workload for the referees should not be increased any further and the discussions would be exhausting, I would also tend to remove it.
Hello,
another major problem is the task design itself. The tasks are designed to be executed autonomously by the robot and, furthermore, asking for help is penalized (Deus ex Machina). EC is working on a new HRI-based task where some social statistics might be included in the scoring sheet, but I need to review it.
That sounds like the best solution. To just move it into a task which is designed for it. The way it is implemented now the idea is nice but it is too much of an overhead.
At the restaurant task, robots were rated by viewers according to their social appeal.
Volunteers approached viewers and asked them to complete the online survey. Surveys were submitted, but not for every team. Ultimately, the points from the social survey were not included in the final score due to too many missing data points.
Do we want to continue conducting social surveys in the future?