HumanSignal / label-studio

Label Studio is a multi-type data labeling and annotation tool with standardized output format
https://labelstud.io
Apache License 2.0
19.26k stars 2.39k forks source link

NER Labeling: UI Performance issues with large task json files #1124

Open roelvanderburg opened 3 years ago

roelvanderburg commented 3 years ago

Describe the bug I'm uploading a ~ 1MB json file containing 21 tasks of around 50kB. These contain a larger text file of around 25kB with 10 to 15 Named-Entity-Recognition (NER) predictions. After uploading the file with the necessary config the UI responds very slowly. It can take up to ~30 seconds to open a task and even scrolling through the predicted document takes ~3 seconds.

To Reproduce Create a large tasks.json file as input Update config to deploy frontend NER labelling using tasks.json Go to localhost UI

Expected behavior The frontend components should have similar speeds to small 10KB json files.

Environment label-studio==1.1.0 label-studio-converter==0.0.29

OS: macOS Big Sur Safari Version 14.1.1 (16611.2.7.1.4) (64-bit)) 2,4 GHz 8-Core Intel Core i9, 32 GB 2667 MHz DDR4

nicholasrq commented 3 years ago

hey! this is a really weird behavior. could you share sample of what you're uploading? if you don't want to share it here, you can send it to me via private message in our community slack. after that i'll be able to reproduce the problem and maybe we'll end up with a solution

schafsam commented 3 years ago

I have a similar issue. I load a task json with around 800 tasks for NER tagging on HTML. The file has 140MB. And by loading up the Data Manager the page gets unresponsive and Heide walks a mile :-) and the loading takes a few minutes. Once I get to the Data Manager scrolling down the task list makes the page again unresponsive. If I look at the Chrome performance I see that the most of the time is used for rendering. I am running v.1.1.0.

image001

Does it make a difference if create tasks and save the HTML as file on disk and link the content instead of embed the content into the task?

inzomiac commented 3 years ago

Had the same thing with additional issue, when highlighting left most part of the panel for me to label it, it doesn't highlight the annotated text. Though the selected text shows on the Region Panel image

My text has a total of almost 15k characters

w0ut0 commented 2 years ago

We have the same issue with individual (big) tasks that are taking a long time to render (think texts that don't fit on 1 page).

Both the initial rendering of the page takes a long time, and then annotating individual tokens also takes a lot of time. We can see a lot of CPU consumption (100%) at those times.

Our current workaround is splitting out tasks, but we would prefer not to do it, as we lose the full context for our annotators.

When making a single annotation, the bottleneck looks like the function selection.removeAllRanges(); (selection-tools.js:236), which takes 2 seconds. We get a warning from Chrome Forced reflow is a likely performance bottleneck.

image