HumanSignal / label-studio

Label Studio is a multi-type data labeling and annotation tool with standardized output format
https://labelstud.io
Apache License 2.0
19.53k stars 2.42k forks source link

Label-Studio crash "Internal Server Error" when trying to filter by Annotation results #5321

Open Skier23 opened 10 months ago

Skier23 commented 10 months ago

Describe the bug Anytime I try to filter by annotation results I get an internal server error and label studio stops responding

To Reproduce Steps to reproduce the behavior:

  1. Try filtering by annotation results
  2. Label Studio stops responding

Expected behavior No crashes

Screenshots image

Environment (please complete the following information):

Logs:

[2024-01-22 09:03:52,010] [django.request::log_response::224] [ERROR] Internal Server Error: /api/dm/views/16/ Traceback (most recent call last): File "D:\test\NeuralDataSet\LabelStudio\env\Lib\site-packages\django\db\backends\utils.py", line 84, in _execute return self.cursor.execute(sql, params) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\test\NeuralDataSet\LabelStudio\env\Lib\site-packages\django\db\backends\sqlite3\base.py", line 423, in execute return Database.Cursor.execute(self, query, params) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ sqlite3.OperationalError: database is locked

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "D:\test\NeuralDataSet\LabelStudio\env\Lib\site-packages\django\core\handlers\exception.py", line 47, in inner response = get_response(request) ^^^^^^^^^^^^^^^^^^^^^ File "D:\test\NeuralDataSet\LabelStudio\env\Lib\site-packages\django\core\handlers\base.py", line 171, in _get_response response = middleware_method(request, callback, callback_args, callback_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\test\NeuralDataSet\LabelStudio\env\Lib\site-packages\sentry_sdk\integrations\django\middleware.py", line 111, in sentry_wrapped_method return old_method(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\test\NeuralDataSet\LabelStudio\env\Lib\site-packages\label_studio\core\middleware.py", line 166, in process_view request.user.update_last_activity() File "D:\test\NeuralDataSet\LabelStudio\env\Lib\site-packages\label_studio\users\models.py", line 69, in update_last_activity self.save(update_fields=['last_activity']) File "D:\test\NeuralDataSet\LabelStudio\env\Lib\site-packages\django\contrib\auth\base_user.py", line 67, in save super().save(args, **kwargs) File "D:\test\NeuralDataSet\LabelStudio\env\Lib\site-packages\django\db\models\base.py", line 739, in save self.save_base(using=using, force_insert=force_insert, File "D:\test\NeuralDataSet\LabelStudio\env\Lib\site-packages\django\db\models\base.py", line 776, in save_base updated = self._save_table( ^^^^^^^^^^^^^^^^^ File "D:\test\NeuralDataSet\LabelStudio\env\Lib\site-packages\django\db\models\base.py", line 858, in _save_table updated = self._do_update(base_qs, using, pk_val, values, update_fields, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\test\NeuralDataSet\LabelStudio\env\Lib\site-packages\django\db\models\base.py", line 912, in _do_update return filtered._update(values) > 0 ^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\test\NeuralDataSet\LabelStudio\env\Lib\site-packages\django\db\models\query.py", line 802, in _update return query.get_compiler(self.db).execute_sql(CURSOR) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\test\NeuralDataSet\LabelStudio\env\Lib\site-packages\django\db\models\sql\compiler.py", line 1559, in execute_sql cursor = super().execute_sql(result_type) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\test\NeuralDataSet\LabelStudio\env\Lib\site-packages\django\db\models\sql\compiler.py", line 1175, in execute_sql cursor.execute(sql, params) File "D:\test\NeuralDataSet\LabelStudio\env\Lib\site-packages\sentry_sdk\integrations\django__init__.py", line 641, in execute return real_execute(self, sql, params) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\test\NeuralDataSet\LabelStudio\env\Lib\site-packages\django\db\backends\utils.py", line 66, in execute return self._execute_with_wrappers(sql, params, many=False, executor=self._execute) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\test\NeuralDataSet\LabelStudio\env\Lib\site-packages\django\db\backends\utils.py", line 75, in _execute_with_wrappers return executor(sql, params, many, context) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\test\NeuralDataSet\LabelStudio\env\Lib\site-packages\django\db\backends\utils.py", line 79, in _execute with self.db.wrap_database_errors: File "D:\test\NeuralDataSet\LabelStudio\env\Lib\site-packages\django\db\utils.py", line 90, in exit raise dj_exc_value.with_traceback(traceback) from exc_value File "D:\test\NeuralDataSet\LabelStudio\env\Lib\site-packages\django\db\backends\utils.py", line 84, in _execute return self.cursor.execute(sql, params) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\test\NeuralDataSet\LabelStudio\env\Lib\site-packages\django\db\backends\sqlite3\base.py", line 423, in execute return Database.Cursor.execute(self, query, params) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ django.db.utils.OperationalError: database is locked

jombooth commented 10 months ago

Hi @Skier23 - thanks for the report and stacktrace. Could you confirm that other filters work okay + it's just annotation results filtering that surfaces this issue?

Skier23 commented 10 months ago

Yea all the other fields seem to work correctly (except for storage_filename but that seems to be a different bug): Cannot resolve keyword 'storage_filename' into field. Choices are: annotations, cancelled_annotations, comment_authors, comment_count, created_at, data, drafts, file_upload, file_upload_id, id, inner_id, io_storages_azureblobimportstoragelink, io_storages_gcsimportstoragelink, io_storages_localfilesimportstoragelink, io_storages_redisimportstoragelink, io_storages_s3importstoragelink, is_labeled, last_comment_updated_at, locks, meta, overlap, predictions, project, project_id, total_annotations, total_predictions, unresolved_comment_count, updated_at, updated_by, updated_by_id

Traceback (most recent call last): File "C:\Users\tyler\anaconda3\Lib\site-packages\django\db\models\sql\query.py", line 1955, in add_fields join_info = self.setup_joins(name.split(LOOKUP_SEP), opts, alias, allow_many=allow_m2m) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\tyler\anaconda3\Lib\site-packages\django\db\models\sql\query.py", line 1648, in setup_joins path, final_field, targets, rest = self.names_to_path( ^^^^^^^^^^^^^^^^^^^ File "C:\Users\tyler\anaconda3\Lib\site-packages\django\db\models\sql\query.py", line 1562, in names_to_path raise FieldError("Cannot resolve keyword '%s' into field. " django.core.exceptions.FieldError: Cannot resolve keyword 'storage_filename' into field. Choices are: annotations, cancelled_annotations, comment_authors, comment_count, created_at, data, drafts, file_upload, file_upload_id, id, inner_id, io_storages_azureblobimportstoragelink, io_storages_gcsimportstoragelink, io_storages_localfilesimportstoragelink, io_storages_redisimportstoragelink, io_storages_s3importstoragelink, is_labeled, last_comment_updated_at, locks, meta, overlap, predictions, project, project_id, total_annotations, total_predictions, unresolved_comment_count, updated_at, updated_by, updated_by_id

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Users\tyler\anaconda3\Lib\site-packages\rest_framework\views.py", line 506, in dispatch response = handler(request, *args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\tyler\anaconda3\Lib\site-packages\django\utils\decorators.py", line 43, in _wrapper return bound_method(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\tyler\anaconda3\Lib\site-packages\label_studio\data_manager\api.py", line 227, in get queryset = self.get_task_queryset(request, prepare_params) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\tyler\anaconda3\Lib\site-packages\label_studio\data_manager\api.py", line 195, in get_task_queryset return Task.prepared.only_filtered(prepare_params=prepare_params) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\tyler\anaconda3\Lib\site-packages\label_studio\data_manager\managers.py", line 661, in only_filtered return queryset.prepared(prepare_params=prepare_params) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\tyler\anaconda3\Lib\site-packages\label_studio\data_manager\managers.py", line 472, in prepared queryset = apply_filters(queryset, prepare_params.filters, project, request) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\tyler\anaconda3\Lib\site-packages\label_studio\data_manager\managers.py", line 350, in apply_filters value_type = type(queryset.values_list(field_name, flat=True)[0]).name ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\tyler\anaconda3\Lib\site-packages\django\db\models\query.py", line 867, in values_list clone = self._values(_fields, expressions) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\tyler\anaconda3\Lib\site-packages\django\db\models\query.py", line 835, in _values clone.query.set_values(fields) File "C:\Users\tyler\anaconda3\Lib\site-packages\django\db\models\sql\query.py", line 2272, in set_values self.add_fields(field_names, True) File "C:\Users\tyler\anaconda3\Lib\site-packages\django\db\models\sql\query.py", line 1982, in add_fields raise FieldError("Cannot resolve keyword %r into field. " django.core.exceptions.FieldError: Cannot resolve keyword 'storage_filename' into field. Choices are: annotations, cancelled_annotations, comment_authors, comment_count, created_at, data, drafts, file_upload, file_upload_id, id, inner_id, io_storages_azureblobimportstoragelink, io_storages_gcsimportstoragelink, io_storages_localfilesimportstoragelink, io_storages_redisimportstoragelink, io_storages_s3importstoragelink, is_labeled, last_comment_updated_at, locks, meta, overlap, predictions, project, project_id, total_annotations, total_predictions, unresolved_comment_count, updated_at, updated_by, updated_by_id

Also, for annotation results, this bug seems to only occur when using "Contains". I found a temporary workaround for now by using Regex instead which allows me to do a similar query and this works correctly.

jombooth commented 10 months ago

indeed re: storage_filename being a different bug - I fixed that in https://github.com/HumanSignal/label-studio/pull/5289, we'll be working that into a release soon. That the bug you found occurs only when Contains is used is an excellent clue, thank you for checking this! We'll track this issue on our side, and fix it when we can; I'm glad that you've been able to set up a workaround in the meantime.

Skier23 commented 10 months ago

In case it helps, I thought I had filtered by annotation results contains before and it worked correctly. To see if it would fix it, I tried recreating a new label studio db (in case there was some kind of db corruption) and reimporting data and this had the same effect of the bug occuring

Skier23 commented 10 months ago

This also applies to "not contains". I'm not entirely sure if the regex filter provides a way to workaround this case.

EDIT: There is a workaround but its not the most pretty: ^(?!.Not included")(?=.Is Included").+$

This will find tasks that dont contain "Not Included" but do contain "Is Included"

Lain810 commented 7 months ago

This also applies to "not contains". I'm not entirely sure if the regex filter provides a way to workaround this case.

EDIT: There is a workaround but its not the most pretty: ^(?!.Not included")(?=.Is Included").+$

This will find tasks that dont contain "Not Included" but do contain "Is Included"

Hello, I have encountered the same issue as you. I am currently unable to access the project with the added filter, and I am also unable to create a new project. Have you found any solutions to make the project, which is now inaccessible due to the added filter, available again? Thank you very much.

Skier23 commented 7 months ago

If you close label-studio and open it again it will not crash until you go onto that view that has the bugged statement. So you must delete that view while in another view without clicking on it.

Lain810 commented 7 months ago

Thank you very much for your response. After re-entering Label Studio, I didn't encounter that error, but when I opened the project corresponding to the filter, the data did not load. I noticed the "database is locked" error in the server backend. I was wondering what you meant by "view." Does it refer to the project?

Skier23 commented 7 months ago

The different tabs you can use to look at data for your project. The default one is “Default”. You need to delete the failing one.

JiashuaiXu commented 4 months ago

Let me tell you my solution, none of the above methods solved my problem.But I was unable to access my project directory after using filter and it showed that the database was locked.


django.db.utils.OperationalError: database is locked
[2024-08-01 05:41:15,031] [core.utils.common::custom_exception_handler::91] [ERROR] cb6e4b22-32e5-4acf-a28f-b293626e8460 database is locked
Traceback (most recent call last):
  File "/home/jesse/miniconda3/envs/label_1_13/lib/python3.12/site-packages/django/db/backends/base/base.py", line 242, in _commit
    return self.connection.commit()
           ^^^^^^^^^^^^^^^^^^^^^^^^
sqlite3.OperationalError: database is locked

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/jesse/miniconda3/envs/label_1_13/lib/python3.12/site-packages/rest_framework/views.py", line 506, in dispatch
    response = handler(request, *args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jesse/miniconda3/envs/label_1_13/lib/python3.12/site-packages/label_studio/projects/api.py", line 482, in get
    history = get_label_stream_history(request.user, project)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jesse/miniconda3/envs/label_1_13/lib/python3.12/site-packages/label_studio/projects/functions/stream_history.py", line 38, in get_label_stream_history
    with transaction.atomic():
  File "/home/jesse/miniconda3/envs/label_1_13/lib/python3.12/site-packages/django/db/transaction.py", line 246, in __exit__
    connection.commit()
  File "/home/jesse/miniconda3/envs/label_1_13/lib/python3.12/site-packages/django/utils/asyncio.py", line 33, in inner
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/jesse/miniconda3/envs/label_1_13/lib/python3.12/site-packages/django/db/backends/base/base.py", line 266, in commit
    self._commit()
  File "/home/jesse/miniconda3/envs/label_1_13/lib/python3.12/site-packages/django/db/backends/base/base.py", line 241, in _commit
    with self.wrap_database_errors:
  File "/home/jesse/miniconda3/envs/label_1_13/lib/python3.12/site-packages/django/db/utils.py", line 90, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/home/jesse/miniconda3/envs/label_1_13/lib/python3.12/site-packages/django/db/backends/base/base.py", line 242, in _commit
    return self.connection.commit()
           ^^^^^^^^^^^^^^^^^^^^^^^^
django.db.utils.OperationalError: database is locked

image

I tried all the delete views in the project failed, and the default view can't be deleted.

image

I know it's an error in the filter on my default page. you could get the page id through the web URL .in the project my default id is 98 ,project id is 9 .if you don't know you can check it through web API
image

So I updated the filter via the official API provided and it worked! image

You need to go to the corresponding web

api address.

Anyway, refer to @Skier23 's suggestion There's must something wrong with the view. I updated the contents of the filter through the interface provided by the web, api. Then solved the problem of view 98 being inaccessible and the database lockup he was causing.


import requests

# 配置API URL和请求头
api_url = "http://your-label-project:8080/api/dm/views/98/"
headers = {
    "Authorization": "Token d5c13639bc280904ceccc317a2xxxxxxxxxxxxxxxxx",  # 确保这里的格式为 'Token <your_token>'
    "Content-Type": "application/json"
}
data = {
    "project": 9,  # 替换为您的项目ID
    "data": {
        "title": "Default",
        "type": "list",
        "target": "tasks",
        "hiddenColumns": {
            "explore": [
                "tasks:annotations_ids",
                "tasks:predictions_score",
                "tasks:predictions_model_versions",
                "tasks:predictions_results",
                "tasks:file_upload",
                "tasks:storage_filename",
                "tasks:created_at",
                "tasks:updated_at",
                "tasks:updated_by",
                "tasks:avg_lead_time",
                "tasks:draft_exists"
            ],
            "labeling": [
                "tasks:cancelled_annotations",
                "tasks:total_predictions",
                "tasks:annotations_ids",
                "tasks:predictions_score",
                "tasks:predictions_model_versions",
                "tasks:predictions_results",
                "tasks:file_upload",
                "tasks:storage_filename",
                "tasks:created_at",
                "tasks:updated_at",
                "tasks:updated_by",
                "tasks:avg_lead_time",
                "tasks:completed_at",
                "tasks:annotators",
                "tasks:annotations_results",
                "tasks:draft_exists",
                "tasks:total_annotations"
            ]
        },
        "columnsWidth": {},
        "columnsDisplayType": {},
        "gridWidth": 4,
        "semantic_search": [],
        "filters": {
            "conjunction": "and",
            "items": [
                {
                    "filter": "filter:tasks:inner_id",
                    "operator": "less_or_equal",
                    "type": "Number",
                    "value": 3500
                }
            ]
        },
        "ordering": [
            "tasks:total_annotations"
        ]
    },
    "user": 1,  # 替换为用户ID
    "project": 9  # 替换为项目ID
}

# 执行API请求
response = requests.put(api_url, headers=headers, json=data)

# 输出响应结果
if response.status_code == 200:
    print("视图更新成功")
    print(response.json())  # 打印更新后的视图数据
else:
    print(f"视图更新失败: {response.status_code} - {response.text}")

image

SethFalco commented 3 months ago

I just encountered this issue, here's a simpler workaround to save your workspace if you can't access the Data Manager.

  1. Navigate to /projects/{PROJECT_ID}/settings/danger-zone in your browser
  2. Restart Label Studio on your server
  3. Refresh the page in your browser
  4. Press Drop All Tabs in Label Studio

So long as you don't actually go into the Data Manager, the database shouldn't lock. So just be sure to head to Danger Zone right away after restarting by already being on the page.

lllabmaster commented 2 weeks ago

replace the default sqlite databse with postgresql database. i encounter similar promblem and this fix the issue. the default sqlite database is running on single-thread mode and the performance is poor.