code-kern-ai / refinery

The data scientist's open-source choice to scale, assess and maintain natural language data. Treat training data like a software artifact.
https://www.kern.ai
Apache License 2.0
1.39k stars 66 forks source link

[BUG] - Deleting a project during tokenization can run into errors #251

Closed JWittmeyer closed 1 year ago

JWittmeyer commented 1 year ago

Describe the bug If a project is deleted during tokenization in some cases it can run into errors.

To Reproduce Steps to reproduce the behavior:

  1. Upload project
  2. Delete project before tokenization is completed
  3. See error

Expected behavior Tokenization is stopped without an error

Desktop (please complete the following information):

Additional context Error logs:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1880, in _execute_context
    self.dialect.do_executemany(
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/dialects/postgresql/psycopg2.py", line 982, in do_executemany
    context._psycopg2_fetched_rows = xtras.execute_values(
  File "/usr/local/lib/python3.9/site-packages/psycopg2/extras.py", line 1299, in execute_values
    cur.execute(b''.join(parts))
psycopg2.errors.ForeignKeyViolation: insert or update on table "record_tokenized" violates foreign key constraint "record_tokenized_record_id_fkey"
DETAIL:  Key (record_id)=(0896dd92-3cd2-4ebe-b4e7-d44f609531d3) is not present in table "record".

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/program/./controller/tokenization_manager.py", line 128, in tokenize_initial_project
    general.add_all(entries)
  File "/program/./submodules/model/business_objects/general.py", line 26, in add_all
    flush_or_commit(with_commit)
  File "/program/./submodules/model/business_objects/general.py", line 53, in flush_or_commit
    session.flush()
  File "<string>", line 2, in flush
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/session.py", line 3429, in flush
    self._flush(objects)
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/session.py", line 3569, in _flush
    transaction.rollback(_capture_exception=True)
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/util/langhelpers.py", line 70, in __exit__
    compat.raise_(
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/util/compat.py", line 208, in raise_
    raise exception
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/session.py", line 3529, in _flush
    flush_context.execute()
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/unitofwork.py", line 456, in execute
    rec.execute(self)
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/unitofwork.py", line 630, in execute
    util.preloaded.orm_persistence.save_obj(
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/persistence.py", line 245, in save_obj
    _emit_insert_statements(
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/persistence.py", line 1156, in _emit_insert_statements
    c = connection._execute_20(
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1705, in _execute_20
    return meth(self, args_10style, kwargs_10style, execution_options)
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/sql/elements.py", line 333, in _execute_on_connection
    return connection._execute_clauseelement(
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1572, in _execute_clauseelement
    ret = self._execute_context(
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1943, in _execute_context
    self._handle_dbapi_exception(
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 2124, in _handle_dbapi_exception
    util.raise_(
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/util/compat.py", line 208, in raise_
    raise exception
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1880, in _execute_context
    self.dialect.do_executemany(
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/dialects/postgresql/psycopg2.py", line 982, in do_executemany
    context._psycopg2_fetched_rows = xtras.execute_values(
  File "/usr/local/lib/python3.9/site-packages/psycopg2/extras.py", line 1299, in execute_values
    cur.execute(b''.join(parts))
sqlalchemy.exc.IntegrityError: (psycopg2.errors.ForeignKeyViolation) insert or update on table "record_tokenized" violates foreign key constraint "record_tokenized_record_id_fkey"
DETAIL:  Key (record_id)=(0896dd92-3cd2-4ebe-b4e7-d44f609531d3) is not present in table "record".

[SQL: INSERT INTO record_tokenized (id, project_id, record_id, bytes, columns) VALUES (%(id)s, %(project_id)s, %(record_id)s, %(bytes)s, %(columns)s::VARCHAR[])]
[parameters: ({'id': UUID('2d29208a-5711-4d70-9e67-a4ce8ed0622a'), 'project_id': '2c9e5fe6-634a-404e-8151-98326ca3f0bd', 'record_id': UUID('0896dd92-3cd2-4ebe-b4e7-d44f609531d3'), 'bytes': <psycopg2.extensions.Binary object at 0x7f95439d5960>, 'columns': []}, {'id': UUID('a95c2eb9-7039-44b4-88f0-83e54c059f15'), 'project_id': '2c9e5fe6-634a-404e-8151-98326ca3f0bd', 'record_id': UUID('4563a7ce-9da0-4647-838f-eaf4dc8b92c0'), 'bytes': <psycopg2.extensions.Binary object at 0x7f95426a5270>, 'columns': []}, {'id': UUID('a41c578f-7587-4aee-9a1c-7610db190404'), 'project_id': '2c9e5fe6-634a-404e-8151-98326ca3f0bd', 'record_id': UUID('206fd3c7-d03b-4d85-9cc3-8da9d3d90a78'), 'bytes': <psycopg2.extensions.Binary object at 0x7f9542cce810>, 'columns': []}, {'id': UUID('2043355f-adb1-4d94-a653-e4ac944ac3bc'), 'project_id': '2c9e5fe6-634a-404e-8151-98326ca3f0bd', 'record_id': UUID('68c283f2-e875-4d32-a808-46c4a4ce3fe3'), 'bytes': <psycopg2.extensions.Binary object at 0x7f9542cce840>, 'columns': []}, {'id': UUID('09016b46-2684-49ea-b835-4a171d7c4d60'), 'project_id': '2c9e5fe6-634a-404e-8151-98326ca3f0bd', 'record_id': UUID('33f78abe-171b-46ea-b579-2db3fb928db2'), 'bytes': <psycopg2.extensions.Binary object at 0x7f9542cce870>, 'columns': []}, {'id': UUID('4f213d55-e654-4123-855d-cb34bc34d3bd'), 'project_id': '2c9e5fe6-634a-404e-8151-98326ca3f0bd', 'record_id': UUID('70007ed3-1db2-48be-9312-be6510a46ed6'), 'bytes': <psycopg2.extensions.Binary object at 0x7f9542cce8a0>, 'columns': []}, {'id': UUID('af5ef6aa-d340-4f02-bd65-4622206f603d'), 'project_id': '2c9e5fe6-634a-404e-8151-98326ca3f0bd', 'record_id': UUID('8f5fccf0-7d44-4352-9ccf-6009b67d7337'), 'bytes': <psycopg2.extensions.Binary object at 0x7f9542cce8d0>, 'columns': []}, {'id': UUID('b9f090f9-a3a0-4c0a-9957-8cfe58f6bc53'), 'project_id': '2c9e5fe6-634a-404e-8151-98326ca3f0bd', 'record_id': UUID('66a04564-01c7-4a3b-9534-2cbcc74f3dda'), 'bytes': <psycopg2.extensions.Binary object at 0x7f9542cce900>, 'columns': []}  ... displaying 10 of 500 total bound parameter sets ...  {'id': UUID('802c20bb-9b64-4c35-8141-f809b5dcfadf'), 'project_id': '2c9e5fe6-634a-404e-8151-98326ca3f0bd', 'record_id': UUID('4cde642a-bb1f-4be8-a2f1-7f73df9d8ff8'), 'bytes': <psycopg2.extensions.Binary object at 0x7f952d5b4990>, 'columns': []}, {'id': UUID('b1316b25-ede2-4e2c-9a95-f01a23b2790c'), 'project_id': '2c9e5fe6-634a-404e-8151-98326ca3f0bd', 'record_id': UUID('8cd7aef6-109c-45c1-a032-3976752ec9a4'), 'bytes': <psycopg2.extensions.Binary object at 0x7f952d5b4810>, 'columns': []})]
(Background on this error at: https://sqlalche.me/e/14/gkpj)

Exception in thread Thread-77:
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1880, in _execute_context
    self.dialect.do_executemany(
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/dialects/postgresql/psycopg2.py", line 982, in do_executemany
    context._psycopg2_fetched_rows = xtras.execute_values(
  File "/usr/local/lib/python3.9/site-packages/psycopg2/extras.py", line 1299, in execute_values
    cur.execute(b''.join(parts))
psycopg2.errors.ForeignKeyViolation: insert or update on table "record_tokenized" violates foreign key constraint "record_tokenized_record_id_fkey"
DETAIL:  Key (record_id)=(0896dd92-3cd2-4ebe-b4e7-d44f609531d3) is not present in table "record".

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/program/./controller/tokenization_manager.py", line 128, in tokenize_initial_project
    general.add_all(entries)
  File "/program/./submodules/model/business_objects/general.py", line 26, in add_all
    flush_or_commit(with_commit)
  File "/program/./submodules/model/business_objects/general.py", line 53, in flush_or_commit
    session.flush()
  File "<string>", line 2, in flush
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/session.py", line 3429, in flush
    self._flush(objects)
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/session.py", line 3569, in _flush
    transaction.rollback(_capture_exception=True)
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/util/langhelpers.py", line 70, in __exit__
    compat.raise_(
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/util/compat.py", line 208, in raise_
    raise exception
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/session.py", line 3529, in _flush
    flush_context.execute()
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/unitofwork.py", line 456, in execute
    rec.execute(self)
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/unitofwork.py", line 630, in execute
    util.preloaded.orm_persistence.save_obj(
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/persistence.py", line 245, in save_obj
    _emit_insert_statements(
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/persistence.py", line 1156, in _emit_insert_statements
    c = connection._execute_20(
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1705, in _execute_20
    return meth(self, args_10style, kwargs_10style, execution_options)
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/sql/elements.py", line 333, in _execute_on_connection
    return connection._execute_clauseelement(
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1572, in _execute_clauseelement
    ret = self._execute_context(
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1943, in _execute_context
    self._handle_dbapi_exception(
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 2124, in _handle_dbapi_exception
    util.raise_(
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/util/compat.py", line 208, in raise_
    raise exception
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1880, in _execute_context
    self.dialect.do_executemany(
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/dialects/postgresql/psycopg2.py", line 982, in do_executemany
    context._psycopg2_fetched_rows = xtras.execute_values(
  File "/usr/local/lib/python3.9/site-packages/psycopg2/extras.py", line 1299, in execute_values
    cur.execute(b''.join(parts))
sqlalchemy.exc.IntegrityError: (psycopg2.errors.ForeignKeyViolation) insert or update on table "record_tokenized" violates foreign key constraint "record_tokenized_record_id_fkey"
DETAIL:  Key (record_id)=(0896dd92-3cd2-4ebe-b4e7-d44f609531d3) is not present in table "record".

[SQL: INSERT INTO record_tokenized (id, project_id, record_id, bytes, columns) VALUES (%(id)s, %(project_id)s, %(record_id)s, %(bytes)s, %(columns)s::VARCHAR[])]
[parameters: ({'id': UUID('2d29208a-5711-4d70-9e67-a4ce8ed0622a'), 'project_id': '2c9e5fe6-634a-404e-8151-98326ca3f0bd', 'record_id': UUID('0896dd92-3cd2-4ebe-b4e7-d44f609531d3'), 'bytes': <psycopg2.extensions.Binary object at 0x7f95439d5960>, 'columns': []}, {'id': UUID('a95c2eb9-7039-44b4-88f0-83e54c059f15'), 'project_id': '2c9e5fe6-634a-404e-8151-98326ca3f0bd', 'record_id': UUID('4563a7ce-9da0-4647-838f-eaf4dc8b92c0'), 'bytes': <psycopg2.extensions.Binary object at 0x7f95426a5270>, 'columns': []}, {'id': UUID('a41c578f-7587-4aee-9a1c-7610db190404'), 'project_id': '2c9e5fe6-634a-404e-8151-98326ca3f0bd', 'record_id': UUID('206fd3c7-d03b-4d85-9cc3-8da9d3d90a78'), 'bytes': <psycopg2.extensions.Binary object at 0x7f9542cce810>, 'columns': []}, {'id': UUID('2043355f-adb1-4d94-a653-e4ac944ac3bc'), 'project_id': '2c9e5fe6-634a-404e-8151-98326ca3f0bd', 'record_id': UUID('68c283f2-e875-4d32-a808-46c4a4ce3fe3'), 'bytes': <psycopg2.extensions.Binary object at 0x7f9542cce840>, 'columns': []}, {'id': UUID('09016b46-2684-49ea-b835-4a171d7c4d60'), 'project_id': '2c9e5fe6-634a-404e-8151-98326ca3f0bd', 'record_id': UUID('33f78abe-171b-46ea-b579-2db3fb928db2'), 'bytes': <psycopg2.extensions.Binary object at 0x7f9542cce870>, 'columns': []}, {'id': UUID('4f213d55-e654-4123-855d-cb34bc34d3bd'), 'project_id': '2c9e5fe6-634a-404e-8151-98326ca3f0bd', 'record_id': UUID('70007ed3-1db2-48be-9312-be6510a46ed6'), 'bytes': <psycopg2.extensions.Binary object at 0x7f9542cce8a0>, 'columns': []}, {'id': UUID('af5ef6aa-d340-4f02-bd65-4622206f603d'), 'project_id': '2c9e5fe6-634a-404e-8151-98326ca3f0bd', 'record_id': UUID('8f5fccf0-7d44-4352-9ccf-6009b67d7337'), 'bytes': <psycopg2.extensions.Binary object at 0x7f9542cce8d0>, 'columns': []}, {'id': UUID('b9f090f9-a3a0-4c0a-9957-8cfe58f6bc53'), 'project_id': '2c9e5fe6-634a-404e-8151-98326ca3f0bd', 'record_id': UUID('66a04564-01c7-4a3b-9534-2cbcc74f3dda'), 'bytes': <psycopg2.extensions.Binary object at 0x7f9542cce900>, 'columns': []}  ... displaying 10 of 500 total bound parameter sets ...  {'id': UUID('802c20bb-9b64-4c35-8141-f809b5dcfadf'), 'project_id': '2c9e5fe6-634a-404e-8151-98326ca3f0bd', 'record_id': UUID('4cde642a-bb1f-4be8-a2f1-7f73df9d8ff8'), 'bytes': <psycopg2.extensions.Binary object at 0x7f952d5b4990>, 'columns': []}, {'id': UUID('b1316b25-ede2-4e2c-9a95-f01a23b2790c'), 'project_id': '2c9e5fe6-634a-404e-8151-98326ca3f0bd', 'record_id': UUID('8cd7aef6-109c-45c1-a032-3976752ec9a4'), 'bytes': <psycopg2.extensions.Binary object at 0x7f952d5b4810>, 'columns': []})]
(Background on this error at: https://sqlalche.me/e/14/gkpj)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1900, in _execute_context
    self.dialect.do_execute(
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/default.py", line 736, in do_execute
    cursor.execute(statement, parameters)
psycopg2.errors.ForeignKeyViolation: insert or update on table "notification" violates foreign key constraint "notification_project_id_fkey"
DETAIL:  Key (project_id)=(2c9e5fe6-634a-404e-8151-98326ca3f0bd) is not present in table "project".

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/threading.py", line 980, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.9/threading.py", line 917, in run
    self._target(*self._args, **self._kwargs)
  File "/program/./controller/tokenization_manager.py", line 142, in tokenize_initial_project
    __handle_error(project_id, user_id, task_id)
  File "/program/./controller/tokenization_manager.py", line 229, in __handle_error
    notification.create(
  File "/program/./submodules/model/business_objects/notification.py", line 106, in create
    general.add(notification, with_commit)
  File "/program/./submodules/model/business_objects/general.py", line 21, in add
    flush_or_commit(with_commit)
  File "/program/./submodules/model/business_objects/general.py", line 53, in flush_or_commit
    session.flush()
  File "<string>", line 2, in flush
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/session.py", line 3429, in flush
    self._flush(objects)
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/session.py", line 3569, in _flush
    transaction.rollback(_capture_exception=True)
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/util/langhelpers.py", line 70, in __exit__
    compat.raise_(
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/util/compat.py", line 208, in raise_
    raise exception
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/session.py", line 3529, in _flush
    flush_context.execute()
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/unitofwork.py", line 456, in execute
    rec.execute(self)
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/unitofwork.py", line 630, in execute
    util.preloaded.orm_persistence.save_obj(
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/persistence.py", line 245, in save_obj
    _emit_insert_statements(
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/persistence.py", line 1238, in _emit_insert_statements
    result = connection._execute_20(
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1705, in _execute_20
    return meth(self, args_10style, kwargs_10style, execution_options)
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/sql/elements.py", line 333, in _execute_on_connection
    return connection._execute_clauseelement(
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1572, in _execute_clauseelement
    ret = self._execute_context(
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1943, in _execute_context
    self._handle_dbapi_exception(
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 2124, in _handle_dbapi_exception
    util.raise_(
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/util/compat.py", line 208, in raise_
    raise exception
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1900, in _execute_context
    self.dialect.do_execute(
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/default.py", line 736, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.IntegrityError: (psycopg2.errors.ForeignKeyViolation) insert or update on table "notification" violates foreign key constraint "notification_project_id_fkey"
DETAIL:  Key (project_id)=(2c9e5fe6-634a-404e-8151-98326ca3f0bd) is not present in table "project".

[SQL: INSERT INTO notification (id, user_id, project_id, type, level, message, important, state, created_at) VALUES (%(id)s, %(user_id)s, %(project_id)s, %(type)s, %(level)s, %(message)s, %(important)s, %(state)s, now())]
[parameters: {'id': UUID('031edead-c613-43f7-be77-a54ac9a90725'), 'user_id': 'acc8f3ff-8395-4f7d-a771-932d9e40af1c', 'project_id': '2c9e5fe6-634a-404e-8151-98326ca3f0bd', 'type': 'TOKEN_CREATION_DONE', 'level': 'ERROR', 'message': 'The tokenization failed. Please contact the support.', 'important': False, 'state': 'INITIAL'}]
(Background on this error at: https://sqlalche.me/e/14/gkpj)