This pull request uses SQLite3 ON CONFLICT syntax to ignore rows that already exist in the words table. The cause of the duplicates is due to an upstream issue with the Signbank export process, probably due to a OUTER JOIN somewhere yielding duplicate rows (examples?)
This change defines a PRIMARY KEY on the words table - this is a schema change, but should not upset any downstream users of the database, since (for Ackama-managed apps anyway), we already treat the ID as a primary key. This is required for ON CONFLICT to work.
ON CONFLICT requires SQLite version 3.24.0 (~2018). There is an older version of ON CONFLICT from 2004 that can raise errors since the syntax following is different. Just noting that in case we run into any versions of SQLite old enough to buy themselves a drink 🍻
With the de-duplication, we process 4954 signs
Without the de-duplication, we process 4964 signs
This pull request uses SQLite3 ON CONFLICT syntax to ignore rows that already exist in the
words
table. The cause of the duplicates is due to an upstream issue with the Signbank export process, probably due to a OUTER JOIN somewhere yielding duplicate rows (examples?)This change defines a PRIMARY KEY on the words table - this is a schema change, but should not upset any downstream users of the database, since (for Ackama-managed apps anyway), we already treat the ID as a primary key. This is required for ON CONFLICT to work.
ON CONFLICT requires SQLite version 3.24.0 (~2018). There is an older version of ON CONFLICT from 2004 that can raise errors since the syntax following is different. Just noting that in case we run into any versions of SQLite old enough to buy themselves a drink 🍻
With the de-duplication, we process 4954 signs Without the de-duplication, we process 4964 signs