ODNZSL / nzsl-dictionary-scripts

Scripts needed to support the NZSL mobile applications
MIT License
3 stars 2 forks source link

Define a primary key on the words table, and use ON CONFLICT to ignore duplicate rows #15

Closed joshmcarthur closed 1 year ago

joshmcarthur commented 1 year ago

This pull request uses SQLite3 ON CONFLICT syntax to ignore rows that already exist in the words table. The cause of the duplicates is due to an upstream issue with the Signbank export process, probably due to a OUTER JOIN somewhere yielding duplicate rows (examples?)

This change defines a PRIMARY KEY on the words table - this is a schema change, but should not upset any downstream users of the database, since (for Ackama-managed apps anyway), we already treat the ID as a primary key. This is required for ON CONFLICT to work.

ON CONFLICT requires SQLite version 3.24.0 (~2018). There is an older version of ON CONFLICT from 2004 that can raise errors since the syntax following is different. Just noting that in case we run into any versions of SQLite old enough to buy themselves a drink 🍻

With the de-duplication, we process 4954 signs Without the de-duplication, we process 4964 signs