TryGhost / Ghost

Independent technology for modern publishing, memberships, subscriptions and newsletters.
https://ghost.org
MIT License
47.65k stars 10.39k forks source link

Bug: Alphabetical ordering is not consistent between DBs #6104

Closed ErisDS closed 3 years ago

ErisDS commented 9 years ago

Take the following strings: 'photo', 'Audio', 'Video'. Note that the first one is not capitalised, but the other two are.

When ordering by names in SQL this can be done in either a case sensitive or case insensitive way:

Case sensitive order is:

This is used by MySQL and Postgres by default

Case insensitive order is:

This is used by Sqlite3 by default

I believe that the latter is the desired output in most cases, and that case sensitive ordering is not the expected outcome. I don't have much to base this on, other than common sense - the case of a letter doesn't really change what letter that is, so if I'm searching for something in an alphabetical list, I look for A and a in the same place.

This is slightly problematic for two reasons:

  1. sqlite3 (e.g. the development environment) does the right thing - or at least does what I was expecting, whereas MySQL & pg do not. It is likely other developers using the API will fall into the same trap I did - thinking it does what you expect in one env and not realising the result will be different in other databases. They really should be consistent 😊
  2. To force MySQL/PG to do an insensitive order by, we need to pass order by LOWER(name) ASC. We could do this by default for all orders, or figure out which orders are strings and need to be made case insensitive. However, if we do this by default, there's no obvious/well understood way to turn it back off.

    Solutions

I don't have a clear idea what the best way to solve this is, but I have a couple of ideas:

Idea 1: make sqlite3 match mysql & postgres

By default force 'COLLATE NOCASE' for SQLite so that all 3 dbs behave the same and use case sensitive ordering. All resources have a slug as well as a name, which could be used for ordering by something case insensitive. Not ideal, but simple-ish.

Long term, we can support adding lower and upper functions to our order syntax.

Idea 2: sensible defaults are best

By default, force case insensitive matches because this is what makes sense to anyone who isn't a computer scientist :grin:. If almost noone wants case sensitive alphabetical order, then it makes more sense to make it harder to do that, rather than making the sensible thing hard.

So, if all of our queries go off with LOWER automatically wrapped around attributes which are known to be test/strings, we need to create a way for a user to say 'don't do case insensitive matches'.

A couple of different ideas for that are:

Assuming we add support for all the mysql functions that could be used in an order statement, and that specifying one would override our default application of LOWER() then we can provide for using a mysql function which doesn't change the case.

Unfortunately there is no MySQL function which means 'original case', but using a no-op function like TRIM() would work because it would get used instead of LOWER() and would not change the string. That's a bit unintuitive though, so we could also just make up our own MySQL function NOCASE() or whatever we want to call it, and use that to mean "don't apply lower but don't apply this either cos it's nonsense :)".

Something like order="^name ASC" or order="!name ASC" could be used to indicate that case sensitivity should be used.

order="name ASC" - does a case insensitive order using the LOWER() function order="NAME ASC" - does a case insensitive order using the UPPER() function order="Name ASC" - does a case sensitive order without applying a function

I'd love to hear more ideas

Conclusion

I am very much leaning towards 'sensible defaults are best' I think doing a lower case ordering is the only thing that actually makes any sense, which means figuring out how to turn it off is all we're left with.

I'm genuinely not sure what the best mechanism for specifying turning it off would be. Using the case of the property seems elegant but a bit side-effect-y.

kevinansfield commented 9 years ago

May not help but at least for postgres it sounds like the default collation is OS-dependent? http://dba.stackexchange.com/questions/106964/why-is-my-postgresql-order-by-case-insensitive

kevinansfield commented 9 years ago

One workaround I've seen before is to add a virtual column to the select (e.g. SELECT posts.*, LOWER(name) AS ci_name) so that it's always possible to order by name or ci_name.

I've seen the ci_x columns implemented both through the select as above and as an actual column that's populated either through database triggers or by the applications data layer.

One downside is that postgres from what I can tell will still use the OS' collation so case-sensitive ordering will still be ABab on OSX, AaBb on other systems.

hwhelchel commented 9 years ago

Postgres has an extension called citext which provides a case insensitive string type.

Benefits:

Limitations:

If you have data in different languages stored in your database, users of one language may find their query results are not as expected if the collation is for another language.

http://www.postgresql.org/docs/9.4/static/citext.html

I have less experience with MySQL but it appears that nonbinary strings are case insensitive and binary strings are case sensitive. So ensuring the nonbinary string types are used would give the desired behavior.

http://dev.mysql.com/doc/refman/5.0/en/case-sensitivity.html

ErisDS commented 9 years ago

One workaround I've seen before is to add a virtual column to the select (e.g. SELECT posts.*, LOWER(name) AS ci_name) so that it's always possible to order by name or ci_name.

This is an interesting idea, Bookshelf has some support for calculated columns. The trick would be only adding the column when an order by was requested (else you need duplicates for all alpha columns). Should be doable.

@hwhelchel Do you have any idea how to turn on that postgres extension within our bookshelf/knex/pg layers? Converting collations in MySQL is something we need to do in order to fix #5945, however this is a pretty hefty task. Your suggestions sound like the 'right' way to do things but are possibly going to be hard to manage because Ghost is distributed amongst so many different environments, databases and database versions.

hwhelchel commented 9 years ago

@ErisDS this would be done at the knex layer using the raw api

https://github.com/tgriesser/knex/issues/208

Hmm yes postgres citext would works in 9.1+. I'll have to take a look at the Ghost db layer to understand more what the challenges are.

You might also find this issue on knex with multiple databases relevant: https://github.com/tgriesser/knex/issues/778

ErisDS commented 8 years ago

I'm closing all OAuth and most API issues temporarily with the later label.

RE: OAuth, for the next 2-3 months we'll be implementing an official Ghost OAuth login system, providing global access to all Ghost blogs with a single login. We'll be opening issues around this system soon, and I don't want to cause confusion with OAuth for the API.

JSON API Overhaul & OAuth access are currently scheduled next on the roadmap

naz commented 5 years ago

As discussed in https://github.com/TryGhost/Ghost/pull/10371#issuecomment-454358290, we will try the approach with overriding standard Bookshelf/knex order options with our own orderRaw which will use order by lower(FIELD) DIRECTION clause to ensure ordering consistency between dbs.

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

naz commented 4 years ago

This problem still exists and should not have been closed by stale-bot.

ErisDS commented 3 years ago

Closing as this hasn't really impacted anyone.