LibriVox / librivox-catalog

LibriVox catalog and reader workflow application
https://librivox.org
MIT License
36 stars 17 forks source link

Table "keywords" lacks UNIQUE constraint on field "value" #235

Open peterjdann opened 6 days ago

peterjdann commented 6 days ago

It should not be possible, even accidentally or by the result of a programming error, to add an entry to table "keywords" that duplicates a value already present.

You have check the number of duplicate values in that table by running the following SQL:

SELECT value, COUNT(value) FROM keywords GROUP BY value HAVING COUNT(value) > 1

In the scrubbed development database, this currently shows three duplicate values.

If/when administrators become more conscious of what keywords have been applied to audiobooks as a result of mooted changes that are currently the subject of two pull requests, it is possible they may seek to edit the contents of some keywords and, in the process (depending on what safeguards the Librivox software implements short of a DB field constraint) they may be able, in the process, to create further duplicates in this table, thus rendering any search-by-keywords process inaccurate.

redrun45 commented 6 days ago

Hmm... our database contains a number of things that were put in, before the code was updated to disallow them in future. If the current code fails to detect existing keywords, so that it creates duplicates, I would expect to see more than 3.

Are you able to recreate whatever bug this was, and trick the code into making duplicates now?

redrun45 commented 6 days ago

*reproduce, I mean, not recreate. Are you able to reproduce whatever conditions caused duplicate keywords, so that the current code will make more?

peterjdann commented 5 days ago

The current code for managing potential duplicate keywords when editing a project works very well indeed at preventing duplicates. Nonetheless, given the number of obvious typos in our keywords values, it might at some stage in the future be tempting to "fix" this problem by directly (not programmatically) editing the database entries. I was surprised to find that "mere" software was being used to protect the integrity of this database column, without any second line of defence, as it were, in the table definition itself. As well as acting as a second layer of defence, setting a UNIQUE constraint on the keywords->value column could serve a useful documentary function.

I can see this is not an urgent matter, however, if it warrants attention at all.

redrun45 commented 5 days ago

That being the case, it's at least worth having as a warning note, for next time we touch the relevant code. Perhaps at some point, a manual schema change like this will be convenient to do along with some other update process. Now that you've pointed it out, it's at least on the radar. 😉

Remembering that this code and schema was created by a paid professional, you may well find the number of irregularities surprising. That all of those irregularities have not been fixed by the very few volunteers who have stopped by over the years, and usually without having this particular professional domain knowledge, should not be surprising. A lot of work goes into managing even the smallest change. Myself, I would struggle to motivate myself for much of what I do, if I didn't get to see people benefit from it directly!