Open-MSS / MSS

A QT application, a OGC web map server, a collaboration server to plan atmospheric research flights.
https://open-mss.github.io
Apache License 2.0
63 stars 83 forks source link

develop: migration of an existing psql database does not keep the current index #2465

Closed ReimarBauer closed 3 months ago

ReimarBauer commented 3 months ago

related to https://github.com/Open-MSS/MSS/pull/2432

I recognized that I can't add users, operations and permissions to the migrated data.

It looks like after the migration I have now in a dump of the sql

--
-- Name: changes_id_seq; Type: SEQUENCE SET; Schema: public; Owner: mscolab
--

SELECT pg_catalog.setval('public.changes_id_seq', 1, false);

--
-- Name: messages_id_seq; Type: SEQUENCE SET; Schema: public; Owner: mscolab
--

SELECT pg_catalog.setval('public.messages_id_seq', 1, false);

--
-- Name: operations_id_seq; Type: SEQUENCE SET; Schema: public; Owner: mscolab
--

SELECT pg_catalog.setval('public.operations_id_seq', 2, true);

--
-- Name: permissions_id_seq; Type: SEQUENCE SET; Schema: public; Owner: mscolab
--

SELECT pg_catalog.setval('public.permissions_id_seq', 3, true);

--
-- Name: users_id_seq; Type: SEQUENCE SET; Schema: public; Owner: mscolab
--

SELECT pg_catalog.setval('public.users_id_seq', 2, true);

I should have


--
-- Name: changes_id_seq; Type: SEQUENCE SET; Schema: public; Owner: mscolab
--

SELECT pg_catalog.setval('public.changes_id_seq', 9987, true);

--
-- Name: messages_id_seq; Type: SEQUENCE SET; Schema: public; Owner: mscolab
--

SELECT pg_catalog.setval('public.messages_id_seq', 9887, true);

--
-- Name: operations_id_seq; Type: SEQUENCE SET; Schema: public; Owner: mscolab
--

SELECT pg_catalog.setval('public.operations_id_seq', 278, true);

--
-- Name: permissions_id_seq; Type: SEQUENCE SET; Schema: public; Owner: mscolab
--

SELECT pg_catalog.setval('public.permissions_id_seq', 5241, true);

--
-- Name: users_id_seq; Type: SEQUENCE SET; Schema: public; Owner: mscolab
--

SELECT pg_catalog.setval('public.users_id_seq', 93, true);
ReimarBauer commented 3 months ago

to document older local migrations

27a026b0daec_to_version_8_0_0.py


"""To version 8.0.0

Revision ID: 27a026b0daec
Revises: 
Create Date: 2023-04-25 17:32:03.529415

"""
from alembic import op
import sqlalchemy as sa

# revision identifiers, used by Alembic.
revision = '27a026b0daec'
down_revision = None
branch_labels = None
depends_on = None

def upgrade():
    # ### commands auto generated by Alembic - please adjust! ###
    with op.batch_alter_table('operations', schema=None) as batch_op:
        batch_op.add_column(sa.Column('active', sa.Boolean(), nullable=True))
        batch_op.add_column(sa.Column('last_used', sa.DateTime(), nullable=True))

    # ### end Alembic commands ###

def downgrade():
    # ### commands auto generated by Alembic - please adjust! ###
    with op.batch_alter_table('operations', schema=None) as batch_op:
        batch_op.drop_column('last_used')
        batch_op.drop_column('active')

    # ### end Alembic commands ###

e62d08ce88a4_to_version_9_0_0.py

"""To version 9.0.0

Revision ID: e62d08ce88a4
Revises: 27a026b0daec
Create Date: 2024-06-07 16:53:43.314338

"""
from alembic import op
import sqlalchemy as sa

# revision identifiers, used by Alembic.
revision = 'e62d08ce88a4'
down_revision = '27a026b0daec'
branch_labels = None
depends_on = None

def upgrade():
    # ### commands auto generated by Alembic - please adjust! ###
    with op.batch_alter_table('users', schema=None) as batch_op:
        batch_op.add_column(sa.Column('authentication_backend', sa.String(length=255), nullable=True))
    op.execute('UPDATE users SET authentication_backend = \'local\' WHERE authentication_backend IS NULL')

    with op.batch_alter_table('users', schema=None) as batch_op:
        batch_op.alter_column('authentication_backend', existing_type=sa.String(length=255), nullable=False)
        batch_op.drop_constraint('users_password_key', type_='unique')

    # ### end Alembic commands ###

def downgrade():
    # ### commands auto generated by Alembic - please adjust! ###
    with op.batch_alter_table('users', schema=None) as batch_op:
        batch_op.create_unique_constraint('users_password_key', ['password'])
        batch_op.drop_column('authentication_backend')

    # ### end Alembic commands ###
matrss commented 3 months ago

To me it looks like postgres didn't increment its internal counters for the id columns and now tries to insert new rows with existing ids because of it. Something like this was the error:

File "/home/mss-mscolab/Miniforge/envs/mssenv/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 924, in do_execute
cursor.execute(statement, parameters)
sqlalchemy.exc.IntegrityError: (psycopg2.errors.UniqueViolation) duplicate key value violates unique constraint "pk_users"
DETAIL: Key (id)=(2) already exists.

So it is getting stuck on the primary keys unique constraint.

I have no clue yet as to why this is happening, when migrating the data we simply take rows from the old database and insert them into the new one. This should increment the counters just like any other insertion, I'd think. I cannot reproduce the issue with SQLite yet.

ReimarBauer commented 3 months ago

I fixed the numbers manually and they increment now.

matrss commented 3 months ago

While trying to run the migration tests against postgres locally I also get this:

E   sqlalchemy.exc.ProgrammingError: (psycopg2.errors.DuplicateObject) type "messagetype" already exists
E
E   [SQL: CREATE TYPE messagetype AS ENUM ('TEXT', 'SYSTEM_MESSAGE', 'IMAGE', 'DOCUMENT')]
E   (Background on this error at: https://sqlalche.me/e/20/f405)

The migrations aren't yet working for postgres because alembic didn't generate code to drop the created enum types on downgrade. We really need to get #2311 done before we can somewhat confidently say that we support postgres at all...

matrss commented 3 months ago

OK, I was able to reproduce the issue locally with the migration tests running against a postgres database, with some tweaks to make it work at all.

This is an excerpt of what it does when migrating the data:

Copying table permissions
Copying row (1, 1, 8, 'creator')
Statement to execute: INSERT INTO permissions (id, op_id, u_id, access_level) VALUES (:id, :op_id, :u_id, :access_level)

So it inserts the rows taken from the old database into the new one, and explicitly sets the primary key to be used. That is correct, since we would want to keep the IDs the same, but it seems like this doesn't also update postgres' own internal counters for the autoincrementing ID columns, leading to it trying to use an already taken ID on the next insert. This is obviously bad.

If we don't explicitly include the id in the insert then we could run into a situation where the id after the data migration is not the same as before. Again, that would be bad.

There must be some way to tell postgres to update its counters to insert after the largest existing id in the table, somehow... Ideally this would be a database agnostic instruction, so that we wouldn't have to figure out the internals for each database type supported by SQLAlchemy.

The issue doesn't exist with SQLite, presumably because it doesn't keep track of the autoincrement counters separately anywhere, but simply inserts with id_of_last_row+1 or something.

matrss commented 3 months ago

I experimentally confirmed that SQLite will always insert with id_of_last_row+1 as the next autoincrement ID.

For MariaDB it is documented here: https://mariadb.com/kb/en/auto_increment/#setting-explicit-values that it will automatically update the autoincrement counter when a row with an explicit value for the column is inserted and that value is larger than or equal to the current next autoincrement ID, so it also should be unaffected by this issue.

If we consider SQLite, MariaDB, and PostgreSQL as the only three databases that we intend to support (which would be my opinion), then we only need to update these counters after a migration to postgres. What do you think?

ReimarBauer commented 3 months ago

If we consider SQLite, MariaDB, and PostgreSQL as the only three databases that we intend to support (which would be my opinion), then we only need to update these counters after a migration to postgres. What do you think?

yes. In some future we maybe can look additional at DuckDB e.g.

ReimarBauer commented 3 months ago

I experimentally confirmed that SQLite will always insert with id_of_last_row+1 as the next autoincrement ID.

We are not alone with that:

and it gaves also the idea that I need to do an additional round of the migration, because it can be happened on the recreating of the database on the staging system.
(When it happens on that point, it would describe also why I haven't seen it on the production system)

We could think about a somehow further check by mscolab to tell about the integrity of the database.

matrss commented 3 months ago

yes. In some future we maybe can look additional at DuckDB

I don't think supporting DuckDB would be a good idea. While it is a relational database supporting SQL, something like MSS is just not the use case it was build for. We don't have expensive and long running analytical queries, what we have are more traditional transactions on the database. It's OLAP vs OLTP basically.

matrss commented 3 months ago

because it can be happened on the recreating of the database on the staging system. (When it happens on that point, it would describe also why I haven't seen it on the production system)

Yes, that is exactly where the issue originates. When the migrations are applied to the new, fresh database the sequences for the autoincrement columns are initialized assuming no data is in the tables, and then the data is inserted without updating these sequences.