davidjurgens / potato

potato: portable text annotation tool
Other
292 stars 49 forks source link

HTML elements in instance_text blocks span annotations #31

Open AmelieW opened 1 year ago

AmelieW commented 1 year ago

Hi, First of all thanks for creating this nice tool!

I'm setting up a study and I'd like to use HTML markup (specifically span objects) in the instance texts that we display for annotation. The task is partially a span annotation task, so I'm using the "annotation_type": "highlight".

The highlighting works as expected when the text in "instance_text" is plain text. However, I'm looking to use some HTML elements to display additional text when people hover over the text. In my data I therefore wrap some text in tags. For this, I'm following the setup in the "match-finding" example project. My data and "text" column is analogue to what they include here: https://github.com/davidjurgens/potato/blob/master/example-projects/match_finding/data_files/pilot_data.csv

When I test the setup and try to highlight spans, the highlighting does not work anymore. I checked if it's HTML tags in general and included some bold-faced text , but this doesn't cause any issues.

I think it's specifically span tags which cause the highlighting to break. My best guess is that because I add a new element/node, the name of the parentElement changes and span annotations are not allowed. I don't know enough about java script to figure out how to avoid or fix this.

Any pointers would be super helpful!

Jiaxin-Pei commented 1 year ago

Hi @AmelieW,

Thanks a lot for reaching out. This is because the start and end id was not properly saved when there is a complicated HTML tag structure. I just made a push to the repo to resolve this issue. Please check out example-projects/sentiment_analysis/configs/sentiment-analysis-span.yaml as an example. When preparing the data files, please follow the example of example-projects/sentiment_analysis/data_files/toy-example.json Please make sure you use templates/base_template_saving_outerHTML.html as your "base_html_template"

This solution currently does not support dictionary input data like the match finding example, so you need to pack your full HTML sequence into the text field.

AmelieW commented 1 year ago

Hi Jiaxin-Pei, Thanks a lot for looking into this!

After updating the repo, I had a look at the sentiment-analysis span example project as you suggested. The highlighting works well now, however, when I checked the annotation output (annotated_instances.json), I noticed that the way the spans are saved there appear a bit strange. The on and offsets and the span itself includes the HTML markup. I'm including the output below.

How could I resolve this?

Thanks, Amelie

{"id": "1", "displayed_text": "<div name=\"instance_text\" data-toggle=\"tooltip\" data-html=\"true\" data-placement=\"top\" data-original-title=\"test tool tip\">Tom: Isn't this awesome?!

", "label_annotations": {}, "span_annotations": [{"start": 138, "end": 249, "span": "<span class=\"span_container\" selection_label=\"neutral\" style=\"background-color:rgb(60, 180, 75, 0.25);\">awesome", "annotation": "neutral"}], "behavioral_data": {"time_string": "Time spent: 0d 0h 0m 7s "}}

Jiaxin-Pei commented 1 year ago

Hi @AmelieW

I just did some test and it's working well on my end. Would it be possible for you to remove the annotation_output folder under sentiment analysis and rerun the program? It might be that old annotation in another mode is loaded incorrectly.

python potato/flask_server.py example-projects/sentiment_analysis/configs/sentiment-analysis-span.yaml -p 8000 --debug