fake-name / xA-Scraper

69 stars 8 forks source link

Enforce lowercase on adding new artist #73

Closed God-damnit-all closed 4 years ago

God-damnit-all commented 4 years ago

As I learn more about pgsql, the more I'm annoyed with its sometimes-case-sensitive-sometimes-not design. Further, since SQLite seems to have some slightly different rules on this too. It's such a pain in the ass. So why not just make all the artist names lowercase?

One could set the column in SQLite to be defined as case insensitive, and in pgsql, add a trigger that makes all the data in the column lowercase, but I think the more uniform approach would be to simply enforce this on the watched-named-editor.html file, and rely on the dbFixer to convert already existing entries. In fact, I see that it already has a lowerCaseNames function defined.

To that end, it would be good if the fixer could get automatically triggered by certain commits. I have some ideas, but I bet you have better ones.

fake-name commented 4 years ago

The correct way to solve this would be to make the relevant column citext/NOCASE.

Really, I think I initially preserved case sensitivity because I wasn't sure if there were hosting sites that were case sensitive, but in retrospect that's a sufficiently stupid thing that it's not a case I think I should bother supporting.

God-damnit-all commented 4 years ago

I saw CITEXT before but all the stuff written on that page of the Limitations made me think it was a better idea just to do a one-time conversion, with the HTML change preventing such a thing from ever being required again.

God-damnit-all commented 4 years ago

Plus, given SQLite seems to have a real case-insensitive column option (if Google results are to be believed), even though I haven't used SQLite in a while, the lack of uniformity bothers me.

fake-name commented 4 years ago

Literally all the issues with citext in postgresql are also going to be present with sqlite.

Effectively, they all boil down to "you have to make a lowercase copy to do the search", where what lowercase means is fiddly depending on a bunch of locale-aware bits and bobs.

Really, even a one-time lowercase operation has all those problems too. And doing it in the browser means you have the fun potential for mismatch between the browser locale and the server locale, which can produce even moar weird issues.

They fundamentally come down "what 'lower case' means depends on your environment" Switching to a different database or tool doesn't fix that, it's a function of how unicode works.

At least postgres tells you what it does. With sqlite, ¯\(ツ)\\/¯. It'll probably use your shell's locale, but who knows?

TL;DR Unicode is fun.

God-damnit-all commented 4 years ago

Well, this just requires pressing the merge button. The correct way to do this has to actually be done first, and I can't imagine it's a very high priority especially now with Twitter starting to make moves to run artists off their platform.