davidjurgens / potato

potato: portable text annotation tool
Other
299 stars 50 forks source link

Why is user score included in dictionary keys in summarization_evaluation? #44

Open AndreaSottana opened 1 year ago

AndreaSottana commented 1 year ago

Hello, I have noticed that as a result of this line https://github.com/davidjurgens/potato/blob/master/potato/server_utils/schemas/likert.py#L39 on summarization_evaluation the generated annotation_output looks something like this

{"label_annotations": {"relevance": {"scale_5": "5"}, "fluency": {"scale_2": "2"}, "coherence": {"scale_4": "4"}, "consistency": {"not consistent": "2"}

or

{"label_annotations": {"relevance": {"scale_4": "4"}, "fluency": {"scale_1": "1"}, "coherence": {"scale_3": "3"}, "consistency": {"consistent": "1"}}

whereas ideally we should have something like

{"label_annotations": {"relevance": {"scale": "5"}, "fluency": {"scale": "2"}, "coherence": {"scale": "4"}, "consistency": "not consistent"}

Not only is there redundancy and duplication of information because the rating is included in both the key and the value, but this also has negative implications for the annotation_output/annotated_instances.tsv file because each rating has its own column meaning that relevance scale 4 would have a different column than relevance scale 3 and the output would look something like this which is really not ideal

Screenshot 2023-04-05 at 16 45 40

Would it not be better to change this line of code from

label = "scale_" + str(i)

to

label = "scale"

to prevent this issue? I have not explored the project yet in enough depth to be confidently able to say whether this would break other sections of the code, but to me it seems like this should be changed. Let me know what you think

Jiaxin-Pei commented 1 year ago

Hi @AndreaSottana , thanks a lot for raising this issue. I agree with you that we need a more elegant way to save the issue. The code you pointed to is the code to generate the HTML for the annotation schema and the label variable was used also for the shortcut keybindings. Therefore, we probably cannot change that part at this time.

What we can do next is to edit the code to save all the annotations. I will try to fix this later this week!