CRISalid-esr / svp-harvester

Sovisu+ publications harvester as microservice
Other
3 stars 2 forks source link

Failure for book duplicates from distinct sources #598

Closed jdp1ps closed 1 month ago

jdp1ps commented 1 month ago
Task exception was never retrieved
future: <Task finished name='hal_harvester_retrieval_1894' coro=<AbstractHarvester.run() done, defined at /code/app/harvesters/abstract_harvester.py:103> exception=AssertionError('Unique isbn10 and isbn13 violationfor book cannot occur twice during book creation : (sqlalchemy.dialects.postgresql.asyncpg.IntegrityError) <class \'asyncpg.exceptions.UniqueViolationError\'>: duplicate key value violates unique constraint "ix_books_isbn13"\nDETAIL:  Key (isbn13)=(9781555819576) already exists.\n[SQL: INSERT INTO books (source, title, title_variants, isbn10, isbn13, publisher) VALUES ($1::VARCHAR, $2::VARCHAR, $3::VARCHAR[], $4::VARCHAR, $5::VARCHAR, $6::VARCHAR) RETURNING books.id]\n[parameters: (\'hal\', \'The Fungal Kingdom\', [], None, \'9781555819576\', \'ASM\')]\n(Background on this error at: https://sqlalche.me/e/20/gkpj)')>
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/dialects/postgresql/asyncpg.py", line 552, in _prepare_and_execute
    self._rows = await prepared_stmt.fetch(*parameters)
  File "/usr/local/lib/python3.10/site-packages/asyncpg/prepared_stmt.py", line 176, in fetch
    data = await self.__bind_execute(args, 0, timeout)
  File "/usr/local/lib/python3.10/site-packages/asyncpg/prepared_stmt.py", line 241, in __bind_execute
    data, status, _ = await self.__do_execute(
  File "/usr/local/lib/python3.10/site-packages/asyncpg/prepared_stmt.py", line 230, in __do_execute
    return await executor(protocol)
  File "asyncpg/protocol/protocol.pyx", line 201, in bind_execute
asyncpg.exceptions.UniqueViolationError: duplicate key value violates unique constraint "ix_books_isbn13"
DETAIL:  Key (isbn13)=(9781555819576) already exists.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1965, in _exec_single_context
    self.dialect.do_execute(
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 921, in do_execute
    cursor.execute(statement, parameters)
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/dialects/postgresql/asyncpg.py", line 585, in execute
    self._adapt_connection.await_(
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/util/_concurrency_py3k.py", line 125, in await_only
    return current.driver.switch(awaitable)  # type: ignore[no-any-return]
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/util/_concurrency_py3k.py", line 185, in greenlet_spawn
    value = await result
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/dialects/postgresql/asyncpg.py", line 564, in _prepare_and_execute
    self._handle_exception(error)
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/dialects/postgresql/asyncpg.py", line 515, in _handle_exception
    self._adapt_connection._handle_exception(error)
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/dialects/postgresql/asyncpg.py", line 802, in _handle_exception
    raise translated_error from error
sqlalchemy.dialects.postgresql.asyncpg.AsyncAdapt_asyncpg_dbapi.IntegrityError: <class 'asyncpg.exceptions.UniqueViolationError'>: duplicate key value violates unique constraint "ix_books_isbn13"
DETAIL:  Key (isbn13)=(9781555819576) already exists.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/code/app/harvesters/abstract_references_converter.py", line 644, in _get_or_create_book
    await session.commit()
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/ext/asyncio/session.py", line 959, in commit
    await greenlet_spawn(self.sync_session.commit)
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/util/_concurrency_py3k.py", line 192, in greenlet_spawn
    result = context.switch(value)
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 1923, in commit
    trans.commit(_to_root=True)
  File "<string>", line 2, in commit
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/orm/state_changes.py", line 139, in _go
    ret_value = fn(self, *arg, **kw)
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 1239, in commit
    self._prepare_impl()
  File "<string>", line 2, in _prepare_impl
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/orm/state_changes.py", line 139, in _go
    ret_value = fn(self, *arg, **kw)
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 1214, in _prepare_impl
    self.session.flush()
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 4179, in flush
    self._flush(objects)
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 4314, in _flush
    with util.safe_reraise():
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/util/langhelpers.py", line 146, in __exit__
    raise exc_value.with_traceback(exc_tb)
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 4275, in _flush
    flush_context.execute()
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/orm/unitofwork.py", line 466, in execute
    rec.execute(self)
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/orm/unitofwork.py", line 642, in execute
    util.preloaded.orm_persistence.save_obj(
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/orm/persistence.py", line 93, in save_obj
    _emit_insert_statements(
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/orm/persistence.py", line 1226, in _emit_insert_statements
    result = connection.execute(
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1412, in execute
    return meth(
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/sql/elements.py", line 516, in _execute_on_connection
    return connection._execute_clauseelement(
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1635, in _execute_clauseelement
    ret = self._execute_context(
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1844, in _execute_context
    return self._exec_single_context(
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1984, in _exec_single_context
    self._handle_dbapi_exception(
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 2339, in _handle_dbapi_exception
    raise sqlalchemy_exception.with_traceback(exc_info[2]) from e
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1965, in _exec_single_context
    self.dialect.do_execute(
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 921, in do_execute
    cursor.execute(statement, parameters)
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/dialects/postgresql/asyncpg.py", line 585, in execute
    self._adapt_connection.await_(
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/util/_concurrency_py3k.py", line 125, in await_only
    return current.driver.switch(awaitable)  # type: ignore[no-any-return]
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/util/_concurrency_py3k.py", line 185, in greenlet_spawn
    value = await result
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/dialects/postgresql/asyncpg.py", line 564, in _prepare_and_execute
    self._handle_exception(error)
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/dialects/postgresql/asyncpg.py", line 515, in _handle_exception
    self._adapt_connection._handle_exception(error)
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/dialects/postgresql/asyncpg.py", line 802, in _handle_exception
    raise translated_error from error
sqlalchemy.exc.IntegrityError: (sqlalchemy.dialects.postgresql.asyncpg.IntegrityError) <class 'asyncpg.exceptions.UniqueViolationError'>: duplicate key value violates unique constraint "ix_books_isbn13"
DETAIL:  Key (isbn13)=(9781555819576) already exists.
[SQL: INSERT INTO books (source, title, title_variants, isbn10, isbn13, publisher) VALUES ($1::VARCHAR, $2::VARCHAR, $3::VARCHAR[], $4::VARCHAR, $5::VARCHAR, $6::VARCHAR) RETURNING books.id]
[parameters: ('hal', 'The Fungal Kingdom', [], None, '9781555819576', 'ASM')]
(Background on this error at: https://sqlalche.me/e/20/gkpj)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/code/app/harvesters/abstract_harvester.py", line 160, in run
    await self.converter.convert(raw_data=raw_data, new_ref=new_ref)
  File "/code/app/harvesters/abstract_references_converter.py", line 126, in wrapper
    await func(self, *args, **kwargs)
  File "/code/app/harvesters/hal/hal_references_converter.py", line 126, in convert
    book = await self._book(raw_data.payload)
  File "/code/app/harvesters/hal/hal_references_converter.py", line 166, in _book
    return await self._get_or_create_book(
  File "/code/app/harvesters/abstract_references_converter.py", line 652, in _get_or_create_book
    book = await self._get_or_create_book(
  File "/code/app/harvesters/abstract_references_converter.py", line 646, in _get_or_create_book
    assert new_attempt is False, (
AssertionError: Unique isbn10 and isbn13 violationfor book cannot occur twice during book creation : (sqlalchemy.dialects.postgresql.asyncpg.IntegrityError) <class 'asyncpg.exceptions.UniqueViolationError'>: duplicate key value violates unique constraint "ix_books_isbn13"
DETAIL:  Key (isbn13)=(9781555819576) already exists.
[SQL: INSERT INTO books (source, title, title_variants, isbn10, isbn13, publisher) VALUES ($1::VARCHAR, $2::VARCHAR, $3::VARCHAR[], $4::VARCHAR, $5::VARCHAR, $6::VARCHAR) RETURNING books.id]
[parameters: ('hal', 'The Fungal Kingdom', [], None, '9781555819576', 'ASM')]
(Background on this error at: https://sqlalche.me/e/20/gkpj)
jdp1ps commented 1 month ago

Book 'The Fungal Kingdom' already exists from another source (Scopus) with ISBN '9781555819576'. Isbn unicity constraint is not defined on a per-source basis : https://github.com/CRISalid-esr/svp-harvester/blob/b25c4110b2b11d0eb2065efc3cdff24aba79a345/app/db/models/book.py#L24-L25 As soon as a book with the same isbn is retrieved from another source, the get-or-create algorithm fails.