getpatchwork / patchwork

Patchwork is a web-based patch tracking system designed to facilitate the contribution and management of contributions to an open-source project.
http://jk.ozlabs.org/projects/patchwork/
GNU General Public License v2.0
273 stars 82 forks source link

[v3.1.2] Cover letters created before migrating to v3 show up separatly in patch list #556

Open alialnu opened 1 year ago

alialnu commented 1 year ago

Example: This cover letter was part of this series before the migration, but it no longer is. Moreover, its state now says None.

There are many other covers with the same issue.

EDIT: The series above actually has a cover letter as part of it, but the cover letter is listed again in the patch list: This one is part of the series. This one isn't.

This isn't consistent with patchsets that were created after the migration.

The issue appeared when I migrated from v2.2.6 to v3.1.2.

alialnu commented 1 year ago

New replies to cover letters created prior to the migration from v2 to v3 will not appear in the cover letter that is part of the series and will not trigger a cover-comment-created event.

alialnu commented 1 year ago

@stephenfin, do you think this issue could be resolved in a new migration? Rolling back could be problematic because the recent migrations aren't reversible. Any suggestion?

stephenfin commented 1 year ago

In theory, yes, migrations can fix anything in the database. I need to find time to look into this though. Perhaps this weekend.

Are new cover letters to new series (created after the migration) getting linked correctly?

stephenfin commented 1 year ago

First pass suggests I've missed something in the migration. That migration got rid of the Submission model, which Patch and Cover inherited from (read: JOINed to) in favour of wholly separate models. My guess is that we've ended up with a row in both tables.

alialnu commented 1 year ago

Are new cover letters to new series (created after the migration) getting linked correctly?

Yes new ones are linked correctly. New cover letters (created after the migration) only have a /project/<Project>/cover/<Message-Id> web url. Old cover letters (created before the migration) have /project/<Project>/cover/<Message-Id> and /project/<Project>/patch/<Message-Id> web urls.

When someone replies to old cover letters, the reply appears in the /project/<Project>/patch/<Message-Id> url, which isn't part of a series.

stephenfin commented 1 year ago

First pass suggests I've missed something in the migration. That migration got rid of the Submission model, which Patch and Cover inherited from (read: JOINed to) in favour of wholly separate models. My guess is that we've ended up with a row in both tables.

Yeah, looks like it's this: I forgot to clean up the old cover letter-related entries in the patchwork_submission table before it got renamed to patchwork_patch (migration 0043).

@alialnu Can you check something for me: do the following commands returns the same count?

SELECT COUNT(patch.id) FROM patchwork_patch patch WHERE ISNULL(patch.state_id);
SELECT COUNT(cover.id) FROM patchwork_cover cover;

And maybe for safety, check the exact IDs also.

SELECT patch.id FROM patchwork_patch patch WHERE ISNULL(patch.state_id);
SELECT cover.id FROM patchwork_cover cover;

Cover letters don't have a state but patches always do, so in theory any entry in patchwork_patch with a null state_id column should be left-over cover letters. I'd like to sanity check this against a real deployment though.

If this works, our apparent fix would be to identify any rows in patchwork_patch_comment that have foreign keys pointing to one of these rows in patchwork_patch, copy them to patchwork_covercomment, then delete the offending rows in patchwork_patch ~(which should cause the patchwork_patchcomment rows to be deleted since we have a CASCADE in place)~ after we've deleted the rows in patchwork_patchcomment since Django doesn't do database-level cascading. Sound reasonable?

EDIT: Effectively this query:

SELECT patch_comment.id
FROM patchwork_patchcomment patch_comment
WHERE patch_comment.patch_id IN (
    SELECT patch.id FROM patchwork_patch patch WHERE ISNULL(patch.state_id)
);
stephenfin commented 1 year ago

I think the following script should do the trick.

# copy any recent patch comments that should have been assigned to a cover letter instead
INSERT INTO patchwork_covercomment (msgid, date, headers, content, cover_id, submitter_id)
SELECT msgid, date, headers, content, patch_id, submitter_id
FROM patchwork_patchcomment
WHERE patchwork_patchcomment.patch_id IN (
    SELECT patch.id FROM patchwork_patch patch WHERE ISNULL(patch.state_id)
);

# delete the duplicate patch comments
DELETE patch_comment
FROM patchwork_patchcomment patch_comment
WHERE patch_comment.patch_id IN (
    SELECT patch.id FROM patchwork_patch patch WHERE ISNULL(patch.state_id)
);

# delete the duplicate patches
DELETE patch FROM patchwork_patch patch WHERE ISNULL(patch.state_id);

Assuming so, I can codify this in a migration script and push a new migration. The only thing is that Django doesn't appear to provide a mechanism to insert a migration between two older migrations (unlike Alembic) so you're either going to need to update to current HEAD, wait for a new release, or run it manually (it should be idempotent so there'll be no issues if you did it manually and later upgraded to a release with this migration in it)

stephenfin commented 1 year ago

I've opened a bug against Django since this isn't the behavior I expected. I expected the deletion of the CoverLetter model to drop all rows in the parent Submission table. This clearly didn't happen.

https://code.djangoproject.com/ticket/34796

alialnu commented 1 year ago

Thanks @stephenfin for helping with this,

@alialnu Can you check something for me: do the following commands returns the same count?

I forgot to mention that I'm using postgresql. I made some changes to the queries:

SELECT patch.id FROM patchwork_patch patch WHERE patch.state_id IS NULL;
   id
--------
  61292
  77881
  [..]
  129709
  129712
(8138 rows)
SELECT cover.id FROM patchwork_cover cover;
   id
--------
 1
 2
 [..]
 129709
 129712
(8222 rows)

They aren't returning the same count.

EDIT: Effectively this query:

SELECT patch_comment.id
FROM patchwork_patchcomment patch_comment
WHERE patch_comment.patch_id IN (
    SELECT patch.id FROM patchwork_patch patch WHERE patch.state_id IS NULL
);
   id
--------
 162805
 162984
 [..]
 163347
 163359
(26 rows)

I think the following script should do the trick.

Removing duplicate patch comments is failing for me:

# copy any recent patch comments that should have been assigned to a cover letter instead
INSERT INTO patchwork_covercomment (msgid, date, headers, content, cover_id, submitter_id)
SELECT msgid, date, headers, content, patch_id, submitter_id
FROM patchwork_patchcomment
WHERE patchwork_patchcomment.patch_id IN (
    SELECT patch.id FROM patchwork_patch patch WHERE patch.state_id IS NULL
);

Output:

INSERT 0 26
# delete the duplicate patch comments
DELETE FROM patchwork_patchcomment AS patch_comment
WHERE patch_comment.patch_id IN (
    SELECT patch.id FROM patchwork_patch patch WHERE patch.state_id IS NULL
);

Output:

ERROR:  update or delete on table "patchwork_patchcomment" violates foreign key constraint "patchwork_event_patch_comment_id_4123011e_fk_patchwork" on table "patchwork_event"
DETAIL:  Key (id)=(162805) is still referenced from table "patchwork_event".
stephenfin commented 1 year ago

Whoops, I forgot about the events. How about if you add the following after the first step?

# migrate any erroneous patch-comment-created events to cover-comment-created events
UPDATE patchwork_event
SET cover_id = patch_id, category = 'cover-comment-created', patch_id = NULL
WHERE patch_id IN (
    SELECT patch.id FROM patchwork_patch patch WHERE patch.state_id IS NULL
);
stephenfin commented 1 year ago

Thanks @stephenfin for helping with this,

@alialnu Can you check something for me: do the following commands returns the same count?

I forgot to mention that I'm using postgresql. I made some changes to the queries:

SELECT patch.id FROM patchwork_patch patch WHERE patch.state_id IS NULL;
   id
--------
  61292
  77881
  [..]
  129709
  129712
(8138 rows)
SELECT cover.id FROM patchwork_cover cover;
   id
--------
 1
 2
 [..]
 129709
 129712
(8222 rows)

They aren't returning the same count.

Ah, right, they will be different since you have new cover letters. You probably need to add a limit to this. We don't track our own created_at/updated_at fields so we need to use date as a stand-in. This should do the trick (or at least the PostgreSQL equivalent will):

SELECT patch.id FROM patchwork_patch patch WHERE patch.state_id IS NULL;

SELECT cover.id
FROM patchwork_cover cover
WHERE cover.date <= (
    SELECT MAX(patch.date) FROM patchwork_patch patch WHERE patch.state_id IS NULL
);

I'm pretty confident that filtering on null state_id will do the trick now though, with what you've already provided.

alialnu commented 1 year ago

Ah, right, they will be different since you have new cover letters. You probably need to add a limit to this. We don't track our own created_at/updated_at fields so we need to use date as a stand-in. This should do the trick (or at least the PostgreSQL equivalent will):

Both queries now return the same number of rows:

SELECT patch.id FROM patchwork_patch patch WHERE patch.state_id IS NULL ORDER BY patch.id;
   id
--------
  40691
  40702
  [..]
  129709
  129712
(8138 rows)
SELECT cover.id
FROM patchwork_cover cover
WHERE cover.date <= (
    SELECT MAX(patch.date) FROM patchwork_patch patch WHERE patch.state_id IS NULL
) ORDER BY cover.id;

   id
--------
  40691
  40702
  [..]
  129709
  129712
(8138 rows)
alialnu commented 1 year ago

Whoops, I forgot about the events. How about if you add the following after the first step?

# migrate any erroneous patch-comment-created events to cover-comment-created events
UPDATE patchwork_event
SET cover_id = patch_id, category = 'cover-comment-created', patch_id = NULL
WHERE patch_id IN (
    SELECT patch.id FROM patchwork_patch patch WHERE patch.state_id IS NULL
);

Deleting duplicate patch comments still fails:

INSERT INTO patchwork_covercomment (msgid, date, headers, content, cover_id, submitter_id)
SELECT msgid, date, headers, content, patch_id, submitter_id
FROM patchwork_patchcomment
WHERE patchwork_patchcomment.patch_id IN (
    SELECT patch.id FROM patchwork_patch patch WHERE patch.state_id IS NULL
);
INSERT 0 26
UPDATE patchwork_event
SET cover_id = patch_id, category = 'cover-comment-created', patch_id = NULL
WHERE patch_id IN (
    SELECT patch.id FROM patchwork_patch patch WHERE patch.state_id IS NULL
);
UPDATE 26
DELETE FROM patchwork_patchcomment AS patch_comment
WHERE patch_comment.patch_id IN (
    SELECT patch.id FROM patchwork_patch patch WHERE patch.state_id IS NULL
);
ERROR:  update or delete on table "patchwork_patchcomment" violates foreign key constraint "patchwork_event_patch_comment_id_4123011e_fk_patchwork" on table "patchwork_event"
DETAIL:  Key (id)=(162805) is still referenced from table "patchwork_event".
stephenfin commented 1 year ago

Deleting duplicate patch comments still fails:

[snip]

ERROR:  update or delete on table "patchwork_patchcomment" violates foreign key constraint "patchwork_event_patch_comment_id_4123011e_fk_patchwork" on table "patchwork_event"
DETAIL:  Key (id)=(162805) is still referenced from table "patchwork_event".

That field should have been unset in the previous statement. Perhaps it needs to be split into two statements. Can you check if it was unset?

SELECT cover_id, patch_id FROM patchwork_event WHERE id=162805;
alialnu commented 1 year ago

That field should have been unset in the previous statement. Perhaps it needs to be split into two statements. Can you check if it was unset?

SELECT cover_id, patch_id FROM patchwork_event WHERE id=162805;

That id belongs to a check-created event, so I guess it shouldn't have been affected by the previous query:

SELECT cover_id, patch_id, created_check_id, category FROM patchwork_event WHERE id=162805;
 cover_id | patch_id | created_check_id |   category
----------+----------+------------------+---------------
          |    63842 |           102713 | check-created
(1 row)
alialnu commented 1 year ago

By the way, I noticed that new cover IDs started counting from 1 after the migration: https://patches.dpdk.org/api/covers/1/ The last cover ID was 40691 before the migration: https://patches.dpdk.org/api/covers/40691/ I guess it's an expected side effect of removing the old CoverLetter model?

stephenfin commented 1 year ago

Apologies: 162805 refers to a row in the patchwork_patchcomment table (I see https://patches.dpdk.org/comment/162805/ resolves to https://patches.dpdk.org/project/dpdk/patch/cover.1640838702.git.songyl@ramaxel.com/#162805). We need to migrate the patch_comment_id field to cover_comment_id field, in addition to migrating patch_id to cover_id.

Let's replace:

# migrate any erroneous patch-comment-created events to cover-comment-created events
UPDATE patchwork_event
SET cover_id = patch_id, category = 'cover-comment-created', patch_id = NULL
WHERE patch_id IN (
    SELECT patch.id FROM patchwork_patch patch WHERE patch.state_id IS NULL
);

with:

# migrate any erroneous patch-comment-created events to cover-comment-created events
UPDATE patchwork_event
SET
    cover_id = patch_id,
    cover_comment_id = patch_comment_id,
    category = 'cover-comment-created',
    patch_id = NULL,
    patch_comment_id = NULL
WHERE patch_id IN (
    SELECT patch.id FROM patchwork_patch patch WHERE patch.state_id IS NULL
);
stephenfin commented 1 year ago

By the way, I noticed that new cover IDs started counting from 1 after the migration: patches.dpdk.org/api/covers/1 The last cover ID was 40691 before the migration: patches.dpdk.org/api/covers/40691 I guess it's an expected side effect of removing the old CoverLetter model?

Yes, that's expected behavior. I'm going to guess PostgreSQL was restarted at some point? I know MySQL restarts the autoincrement counter at the lowest available value when it's restarted. It shouldn't matter since the ID has no significance and we've intentionally started hiding it in recent releases (at least in the web UI: the REST API and deprecated XML-RPC API still need work)

alialnu commented 1 year ago

Apologies: 162805 refers to a row in the patchwork_patchcomment table (I see https://patches.dpdk.org/comment/162805/ resolves to https://patches.dpdk.org/project/dpdk/patch/cover.1640838702.git.songyl@ramaxel.com/#162805). We need to migrate the patch_comment_id field to cover_comment_id field, in addition to migrating patch_id to cover_id.

Let's replace:

# migrate any erroneous patch-comment-created events to cover-comment-created events
UPDATE patchwork_event
SET cover_id = patch_id, category = 'cover-comment-created', patch_id = NULL
WHERE patch_id IN (
    SELECT patch.id FROM patchwork_patch patch WHERE patch.state_id IS NULL
);

with:

# migrate any erroneous patch-comment-created events to cover-comment-created events
UPDATE patchwork_event
SET
    cover_id = patch_id,
    cover_comment_id = patch_comment_id,
    category = 'cover-comment-created',
    patch_id = NULL,
    patch_comment_id = NULL
WHERE patch_id IN (
    SELECT patch.id FROM patchwork_patch patch WHERE patch.state_id IS NULL
);

I get a different error right now:

UPDATE patchwork_event
SET
        cover_id = patch_id,
        cover_comment_id = patch_comment_id,
        category = 'cover-comment-created',
        patch_id = NULL,
        patch_comment_id = NULL
WHERE patch_id IN (
    SELECT patch.id FROM patchwork_patch patch WHERE patch.state_id IS NULL
);
ERROR:  insert or update on table "patchwork_event" violates foreign key constraint "patchwork_event_cover_comment_id_861b3ec2_fk_patchwork"
DETAIL:  Key (cover_comment_id)=(162801) is not present in table "patchwork_covercomment".
alialnu commented 1 year ago

By the way, I noticed that new cover IDs started counting from 1 after the migration: patches.dpdk.org/api/covers/1 The last cover ID was 40691 before the migration: patches.dpdk.org/api/covers/40691 I guess it's an expected side effect of removing the old CoverLetter model?

Yes, that's expected behavior. I'm going to guess PostgreSQL was restarted at some point? I know MySQL restarts the autoincrement counter at the lowest available value when it's restarted. It shouldn't matter since the ID has no significance and we've intentionally started hiding it in recent releases (at least in the web UI: the REST API and deprecated XML-RPC API still need work)

Thanks for the confirmation. I don't recall restarting the database server. The counter was reset right after the migration.

stephenfin commented 1 year ago

Lovely. So when we did the first step in the migration

# copy any recent patch comments that should have been assigned to a cover letter instead
INSERT INTO patchwork_covercomment (msgid, date, headers, content, cover_id, submitter_id)
SELECT msgid, date, headers, content, patch_id, submitter_id
FROM patchwork_patchcomment
WHERE patchwork_patchcomment.patch_id IN (
    SELECT patch.id FROM patchwork_patch patch WHERE ISNULL(patch.state_id)
);

we will have got a new, likely different ID for each new entry in patchwork_covercomment. As such, we need to set cover_comment_id to the ID of the row in patchwork_covercomment which corresponds to the old patchwork_patchcomment row. I'm not actually entirely sure how to do that. The best I could come up with is this horribly inefficient query.

# migrate any erroneous patch-comment-created events to cover-comment-created events
UPDATE patchwork_event
SET
    cover_id = patch_id,
    cover_comment_id = (
        SELECT patchwork_covercomment.id
        FROM patchwork_covercomment
        INNER JOIN patchwork_patchcomment
        ON patchwork_covercomment.msgid = patchwork_patchcomment.msgid
        WHERE patchwork_patchcomment.id = patchwork_event.patch_comment_id
    ),
    category = 'cover-comment-created',
    patch_id = NULL,
    patch_comment_id = NULL
WHERE patch_id IN (
    SELECT patch.id FROM patchwork_patch patch WHERE patch.state_id IS NULL
);

I've no idea if that will work or not as I still haven't got a full reproducer locally for these follow-up issues.

stephenfin commented 1 year ago

Also, just for my own sanity, I think we're now at the following migration script.

# copy any recent patch comments that should have been assigned to a cover letter instead
INSERT INTO patchwork_covercomment (msgid, date, headers, content, cover_id, submitter_id)
SELECT msgid, date, headers, content, patch_id, submitter_id
FROM patchwork_patchcomment
WHERE patchwork_patchcomment.patch_id IN (
    SELECT patch.id FROM patchwork_patch patch WHERE patch.state_id IS NULL
);

# migrate any erroneous patch-comment-created events to cover-comment-created events
UPDATE patchwork_event
SET
    cover_id = patch_id,
    cover_comment_id = (
        SELECT patchwork_covercomment.id
        FROM patchwork_covercomment
        INNER JOIN patchwork_patchcomment
        ON patchwork_covercomment.msgid = patchwork_patchcomment.msgid
        WHERE patchwork_patchcomment.id = patchwork_event.patch_comment_id
    ),
    category = 'cover-comment-created',
    patch_id = NULL,
    patch_comment_id = NULL
WHERE patch_id IN (
    SELECT patch.id FROM patchwork_patch patch WHERE patch.state_id IS NULL
);

# delete the duplicate patch comments
DELETE patch_comment
FROM patchwork_patchcomment patch_comment
WHERE patch_comment.patch_id IN (
    SELECT patch.id FROM patchwork_patch patch WHERE patch.state_id IS NULL
);

# delete the duplicate patches
DELETE patch FROM patchwork_patch patch WHERE patch.state_id IS NULL;
alialnu commented 1 year ago

we will have got a new, likely different ID for each new entry in patchwork_covercomment. As such, we need to set cover_comment_id to the ID of the row in patchwork_covercomment which corresponds to the old patchwork_patchcomment row. I'm not actually entirely sure how to do that. The best I could come up with is this horribly inefficient query.

The UPDATE statement fails with:

ERROR:  more than one row returned by a subquery used as an expression

I'm having a difficult time constructing a query to find the problematic IDs though.

stephenfin commented 1 year ago

How about if we drop the subquery and do joins?

UPDATE patchwork_event AS event
    INNER JOIN patchwork_patchcomment AS patch_comment
    ON patch_comment.id = event.patch_comment_id
    INNER JOIN patchwork_covercomment AS cover_comment
    ON cover_comment.msgid = patch_comment.msgid
SET
    event.cover_id = event.patch_id,
    event.cover_comment_id = cover_comment.id,
    event.category = 'cover-comment-created',
    event.patch_id = NULL,
    event.patch_comment_id = NULL
WHERE event.patch_id IN (
    SELECT patch.id FROM patchwork_patch patch WHERE patch.state_id IS NULL
);

Again, disclaimer that this is not beyond validating that it's valid (MySQL) syntax.

alialnu commented 1 year ago

Quick update while I try to rewrite the query:

How about if we drop the subquery and do joins?

UPDATE patchwork_event AS event
    INNER JOIN patchwork_patchcomment AS patch_comment
    ON patch_comment.id = event.patch_comment_id
    INNER JOIN patchwork_covercomment AS cover_comment
    ON cover_comment.msgid = patch_comment.msgid
SET
    event.cover_id = event.patch_id,
    event.cover_comment_id = cover_comment.id,
    event.category = 'cover-comment-created',
    event.patch_id = NULL,
    event.patch_comment_id = NULL
WHERE event.patch_id IN (
    SELECT patch.id FROM patchwork_patch patch WHERE patch.state_id IS NULL
);

Doesn't seem like it's valid syntax: https://www.postgresql.org/docs/current/sql-update.html

ERROR:  syntax error at or near "INNER"
LINE 2:     INNER JOIN patchwork_patchcomment AS patch_comment
stephenfin commented 1 year ago

Guess that is MySQL only then. I was thinking as much but didn't want to go read the ANSI SQL docs (wherever they live) :sweat_smile:

EDIT: StackOverflow suggests this might be the equivalent syntax?

UPDATE patchwork_event event
SET
    event.cover_id = event.patch_id,
    event.cover_comment_id = cover_comment.id,
    event.category = 'cover-comment-created',
    event.patch_id = NULL,
FROM patchwork_patchcomment patch_comment
    INNER JOIN patchwork_covercomment cover_comment
    ON patch_comment.msgid = cover_comment.msgid
WHERE
    patch_comment.id = event.patch_comment_id
    AND event.patch_id IN (
        SELECT patch.id FROM patchwork_patch patch WHERE patch.state_id IS NULL
    );

?

alialnu commented 11 months ago

Sorry for the late response @stephenfin, I missed your last suggestion.

The suggested query failed with:

ERROR:  column "event" of relation "patchwork_event" does not exist
LINE 3:     event.cover_id = event.patch_id,

I got it to work with these changes:

UPDATE patchwork_event AS event
SET
    cover_id = event.patch_id,
    cover_comment_id = cover_comment.id,
    category = 'cover-comment-created',
    patch_id = NULL
FROM patchwork_patchcomment patch_comment
INNER JOIN patchwork_covercomment cover_comment
    ON patch_comment.msgid = cover_comment.msgid
WHERE
    patch_comment.id = event.patch_comment_id
    AND event.patch_id IN (
        SELECT patch.id FROM patchwork_patch patch WHERE patch.state_id IS NULL
);
UPDATE 26

Removing duplicate patch comments still fails though:

DELETE FROM patchwork_patchcomment AS patch_comment
WHERE patch_comment.patch_id IN (
    SELECT patch.id FROM patchwork_patch patch WHERE patch.state_id IS NULL
);
ERROR:  update or delete on table "patchwork_patchcomment" violates foreign key constraint "patchwork_event_patch_comment_id_4123011e_fk_patchwork" on table "patchwork_event"
DETAIL:  Key (id)=(162805) is still referenced from table "patchwork_event".