Chocobozzz / PeerTube

ActivityPub-federated video streaming platform using P2P directly in your web browser
https://joinpeertube.org/
GNU Affero General Public License v3.0
12.95k stars 1.49k forks source link

SequelizeUniqueConstraintError: (tag) name must be unique (Failed to import channel) #5488

Closed drzraf closed 1 year ago

drzraf commented 1 year ago

Describe the current behavior

I'm experiencing a problem similar to #5072 (although the error is slighly disting) using v5.0.0 when retring the synchronization of a Youtube channel (multiple videos failed during previous attempts)

During import I get the below SequelizeUniqueConstraintError

In a first attempt to workaround the issue I did

> DELETE FROM tag where id NOT IN (SELECT tagId FROM videoTag);
DELETE 527

But next import attempt triggered an identical on another video/tag tuple (tag = asia) what stopped the sync again: This time it shows symptoms of a collation issue (at first glance at least):

Using docker run -it --rm --network peertube_default postgres psql -U peertube -h postgres peertube_prod

The existing tag is associated with a successfully imported video, but resuming the import leads to the SequelizeUniqueConstraintError

(I may be missing something (I'm not used with pgsql) but even after UPDATEing the value to plain ascii, I still can't SELECT it using a simple WHERE name = 'asia' clause)

Steps to reproduce

  1. Retry a (previously failed) youtube channel synchronization
  2. Look at the log
  3. Import is stopped with
    2022-12-28 13:42:54.819 error: Failed to import channel it {
    "err": {
    "stack": "SequelizeUniqueConstraintError: Validation error\n    at Query.formatError (/app/node_modules/sequelize/lib/dialects/postgres/query.js:272:18)\n    at Query.run (/app/node_modules/sequelize/lib/dialects/postgres/query.js:84:20)\n    at runMicrotasks (<anonymous>)\n    at processTicksAndRejections (node:internal/process/task_queues:96:5)\n    at async /app/node_modules/sequelize/lib/sequelize.js:314:16\n    at async PostgresQueryInterface.insert (/app/node_modules/sequelize/lib/dialects/abstract/query-interface.js:308:21)\n    at async TagModel.save (/app/node_modules/sequelize/lib/model.js:2432:35)\n    at async Function.create (/app/node_modules/sequelize/lib/model.js:1344:12)\n    at async Function.findOrCreate (/app/node_modules/sequelize/lib/model.js:1404:25)\n    at async Promise.all (index 2)",
    "message": "Validation error",
    "name": "SequelizeUniqueConstraintError",
    "errors": [
      {
        "message": "name must be unique",
        "type": "unique violation",
        "path": "name",
        "value": "sacrifici animali",
        "origin": "DB",
        "instance": {
          "id": null,
          "name": "sacrifici animali"
        },
        "validatorKey": "not_unique",
        "validatorName": null,
        "validatorArgs": []
      }
    ],
    "parent": {
      "sql": "CREATE OR REPLACE FUNCTION pg_temp.testfunc(OUT response \"tag\", OUT sequelize_caught_exception text) RETURNS RECORD AS $func_c0aa0b2c86024a8b884fa4cb70ca8d87$ BEGIN INSERT INTO \"tag\" (\"id\",\"name\") VALUES (DEFAULT,'sacrifici animali') RETURNING * INTO response; EXCEPTION WHEN unique_violation THEN GET STACKED DIAGNOSTICS sequelize_caught_exception = PG_EXCEPTION_DETAIL; END $func_c0aa0b2c86024a8b884fa4cb70ca8d87$ LANGUAGE plpgsql; SELECT (testfunc.response).\"id\", (testfunc.response).\"name\", testfunc.sequelize_caught_exception FROM pg_temp.testfunc(); DROP FUNCTION IF EXISTS pg_temp.testfunc();",
      "code": "23505",
      "detail": "Key (name)=(sacrifici animali) already exists."
    },
    "fields": {
      "name": "sacrifici animali"
    },
    "sql": "CREATE OR REPLACE FUNCTION pg_temp.testfunc(OUT response \"tag\", OUT sequelize_caught_exception text) RETURNS RECORD AS $func_c0aa0b2c86024a8b884fa4cb70ca8d87$ BEGIN INSERT INTO \"tag\" (\"id\",\"name\") VALUES (DEFAULT,'sacrifici animali') RETURNING * INTO response; EXCEPTION WHEN unique_violation THEN GET STACKED DIAGNOSTICS sequelize_caught_exception = PG_EXCEPTION_DETAIL; END $func_c0aa0b2c86024a8b884fa4cb70ca8d87$ LANGUAGE plpgsql; SELECT (testfunc.response).\"id\", (testfunc.response).\"name\", testfunc.sequelize_caught_exception FROM pg_temp.testfunc(); DROP FUNCTION IF EXISTS pg_temp.testfunc();"
    }
    }

(Please note that logs gives no clue about which video is at cause)

Describe the expected behavior

Additional information

SHOW SERVER_ENCODING; UTF8 \encodingUTF8 \dc empty \dO empty

 \l peertube_prod
                                                 List of databases
     Name      |  Owner   | Encoding |   Collate   |    Ctype    | ICU Locale | Locale Provider | Access privileges 
---------------+----------+----------+-------------+-------------+------------+-----------------+-------------------
 peertube_prod | peertube | UTF8     | en_US.UTF-8 | en_US.UTF-8 |            | libc            | 
\d tag
                                    Table "public.tag"
 Column |          Type          | Collation | Nullable |             Default             
--------+------------------------+-----------+----------+---------------------------------
 id     | integer                |           | not null | nextval('tag_id_seq'::regclass)
 name   | character varying(255) |           | not null | 
Chocobozzz commented 1 year ago

Hello,

I think the index (tag_name, and maybe tag_lower_name) is broken. Can you try to:

You can check the index collation using:

WITH defcoll AS (
    SELECT datcollate AS coll
    FROM pg_database
    WHERE datname = current_database()
 )
 SELECT icol.pos,
        CASE WHEN c.collname = 'default'
             THEN defcoll.coll
             ELSE c.collname
        END AS collation
 FROM pg_index AS i
    CROSS JOIN unnest(i.indcollation) WITH ORDINALITY AS icol(coll, pos)
    CROSS JOIN defcoll
    LEFT JOIN pg_collation AS c ON c.oid = icol.coll
 WHERE i.indexrelid = 'tag_name'::regclass
 ORDER BY icol.pos;
drzraf commented 1 year ago

Regarding index collation:

 pos |  collation  
-----+-------------
   1 | en_US.UTF-8

Regarding REINDEX:

REINDEX  (VERBOSE) INDEX tag_lower_name;
INFO:  index "tag_lower_name" was reindexed
DETAIL:  CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.04 s
REINDEX
REINDEX  (VERBOSE) INDEX tag_name;
ERROR:  could not create unique index "tag_name"
DETAIL:  Key (name)=(sacrifici animali) is duplicated.
SELECT * from tag where name like '%sacrifici animal%';
  id  |       name        
------+-------------------
 1334 | sacrifici animali
 2036 | sacrifici animali
(2 rows)

SELECT * from tag where name = 'sacrifici animali';
  id  |       name        
------+-------------------
 1334 | sacrifici animali
(1 row)

I've tons of dup's... wtf?

SELECT name, count(1) as c from tag group by name HAVING count(1) > 1 ORDER BY c DESC;
           name           | c 
--------------------------+---
 animali                  | 8
 Animali                  | 6
 galline                  | 5
 Vegan                    | 5
 carne                    | 5
 cina                     | 5
 pollo                    | 4
 crudeltà                 | 4
 polli                    | 4
 animal                   | 3
 ...
Chocobozzz commented 1 year ago

I think your index was broken. I suggest to remove duplicates and recreate the index

drzraf commented 1 year ago
Chocobozzz commented 1 year ago

Yes, you can try

DELETE FROM "tag" v1 USING (SELECT MIN(id) as id, "name" FROM "tag" GROUP BY "name" HAVING COUNT(*) > 1) v2 WHERE v1."name" = v2."name" AND v1.id <> v2.id
drzraf commented 1 year ago

Thank you.

I ended up using a combo of

SELECT CONCAT(id, ',', name) FROM tag WHERE name IN (SELECT name  FROM tag GROUP by name HAVING count(name) > 1) ORDER BY name, id  ASC

piped to

pname= pid=; while IFS=, read id name; do [[ $pname = $name ]] && echo "UPDATE \"videoTag\" SET \"tagId\" = $pid WHERE \"tagId\" = $id;" && continue; pname=$name; pid=$id; done

to generate UPDATE SQL statements and do the videoTag deduplication. I then rebuilt the INDEX and the next import attempts succeeded.