Why is user score included in dictionary keys in summarization_evaluation?

Hello, I have noticed that as a result of this line https://github.com/davidjurgens/potato/blob/master/potato/server_utils/schemas/likert.py#L39 on summarization_evaluation the generated annotation_output looks something like this

{"label_annotations": {"relevance": {"scale_5": "5"}, "fluency": {"scale_2": "2"}, "coherence": {"scale_4": "4"}, "consistency": {"not consistent": "2"}

{"label_annotations": {"relevance": {"scale_4": "4"}, "fluency": {"scale_1": "1"}, "coherence": {"scale_3": "3"}, "consistency": {"consistent": "1"}}

whereas ideally we should have something like

{"label_annotations": {"relevance": {"scale": "5"}, "fluency": {"scale": "2"}, "coherence": {"scale": "4"}, "consistency": "not consistent"}

Not only is there redundancy and duplication of information because the rating is included in both the key and the value, but this also has negative implications for the annotation_output/annotated_instances.tsv file because each rating has its own column meaning that relevance scale 4 would have a different column than relevance scale 3 and the output would look something like this which is really not ideal

Would it not be better to change this line of code from

label = "scale_" + str(i)

label = "scale"

to prevent this issue? I have not explored the project yet in enough depth to be confidently able to say whether this would break other sections of the code, but to me it seems like this should be changed. Let me know what you think

davidjurgens / potato

Why is user score included in dictionary keys in summarization_evaluation? #44