Closed ErisDS closed 3 years ago
May not help but at least for postgres it sounds like the default collation is OS-dependent? http://dba.stackexchange.com/questions/106964/why-is-my-postgresql-order-by-case-insensitive
One workaround I've seen before is to add a virtual column to the select (e.g. SELECT posts.*, LOWER(name) AS ci_name
) so that it's always possible to order by name
or ci_name
.
I've seen the ci_x
columns implemented both through the select as above and as an actual column that's populated either through database triggers or by the applications data layer.
One downside is that postgres from what I can tell will still use the OS' collation so case-sensitive ordering will still be ABab
on OSX, AaBb
on other systems.
Postgres has an extension called citext
which provides a case insensitive string type.
Benefits:
LOWER()
LOWER()
Limitations:
text
but more efficient than LOWER()
citext
behavior is a function of the LC_CTYPE
of the database. From postgres docs:If you have data in different languages stored in your database, users of one language may find their query results are not as expected if the collation is for another language.
http://www.postgresql.org/docs/9.4/static/citext.html
I have less experience with MySQL but it appears that nonbinary strings are case insensitive and binary strings are case sensitive. So ensuring the nonbinary string types are used would give the desired behavior.
http://dev.mysql.com/doc/refman/5.0/en/case-sensitivity.html
One workaround I've seen before is to add a virtual column to the select (e.g. SELECT posts.*, LOWER(name) AS ci_name) so that it's always possible to order by name or ci_name.
This is an interesting idea, Bookshelf has some support for calculated columns. The trick would be only adding the column when an order by
was requested (else you need duplicates for all alpha columns). Should be doable.
@hwhelchel Do you have any idea how to turn on that postgres extension within our bookshelf/knex/pg layers? Converting collations in MySQL is something we need to do in order to fix #5945, however this is a pretty hefty task. Your suggestions sound like the 'right' way to do things but are possibly going to be hard to manage because Ghost is distributed amongst so many different environments, databases and database versions.
@ErisDS this would be done at the knex layer using the raw
api
https://github.com/tgriesser/knex/issues/208
Hmm yes postgres citext would works in 9.1+. I'll have to take a look at the Ghost db layer to understand more what the challenges are.
You might also find this issue on knex with multiple databases relevant: https://github.com/tgriesser/knex/issues/778
I'm closing all OAuth and most API issues temporarily with the later
label.
RE: OAuth, for the next 2-3 months we'll be implementing an official Ghost OAuth login system, providing global access to all Ghost blogs with a single login. We'll be opening issues around this system soon, and I don't want to cause confusion with OAuth for the API.
JSON API Overhaul & OAuth access are currently scheduled next on the roadmap
As discussed in https://github.com/TryGhost/Ghost/pull/10371#issuecomment-454358290, we will try the approach with overriding standard Bookshelf/knex order
options with our own orderRaw
which will use order by lower(FIELD) DIRECTION
clause to ensure ordering consistency between dbs.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This problem still exists and should not have been closed by stale-bot.
Closing as this hasn't really impacted anyone.
Take the following strings: 'photo', 'Audio', 'Video'. Note that the first one is not capitalised, but the other two are.
When ordering by names in SQL this can be done in either a case sensitive or case insensitive way:
Case sensitive order is:
This is used by MySQL and Postgres by default
Case insensitive order is:
This is used by Sqlite3 by default
I believe that the latter is the desired output in most cases, and that case sensitive ordering is not the expected outcome. I don't have much to base this on, other than common sense - the case of a letter doesn't really change what letter that is, so if I'm searching for something in an alphabetical list, I look for A and a in the same place.
This is slightly problematic for two reasons:
order by LOWER(name) ASC
. We could do this by default for all orders, or figure out which orders are strings and need to be made case insensitive. However, if we do this by default, there's no obvious/well understood way to turn it back off.Solutions
I don't have a clear idea what the best way to solve this is, but I have a couple of ideas:
Idea 1: make sqlite3 match mysql & postgres
By default force 'COLLATE NOCASE' for SQLite so that all 3 dbs behave the same and use case sensitive ordering. All resources have a
slug
as well as a name, which could be used for ordering by something case insensitive. Not ideal, but simple-ish.Long term, we can support adding
lower
andupper
functions to our order syntax.Idea 2: sensible defaults are best
By default, force case insensitive matches because this is what makes sense to anyone who isn't a computer scientist :grin:. If almost noone wants case sensitive alphabetical order, then it makes more sense to make it harder to do that, rather than making the sensible thing hard.
So, if all of our queries go off with
LOWER
automatically wrapped around attributes which are known to be test/strings, we need to create a way for a user to say 'don't do case insensitive matches'.A couple of different ideas for that are:
Assuming we add support for all the mysql functions that could be used in an order statement, and that specifying one would override our default application of
LOWER()
then we can provide for using a mysql function which doesn't change the case.Unfortunately there is no MySQL function which means 'original case', but using a no-op function like
TRIM()
would work because it would get used instead ofLOWER()
and would not change the string. That's a bit unintuitive though, so we could also just make up our own MySQL functionNOCASE()
or whatever we want to call it, and use that to mean "don't apply lower but don't apply this either cos it's nonsense :)".Something like
order="^name ASC"
ororder="!name ASC"
could be used to indicate that case sensitivity should be used.order="name ASC"
- does a case insensitive order using the LOWER() functionorder="NAME ASC"
- does a case insensitive order using the UPPER() functionorder="Name ASC"
- does a case sensitive order without applying a functionI'd love to hear more ideas
Conclusion
I am very much leaning towards 'sensible defaults are best' I think doing a lower case ordering is the only thing that actually makes any sense, which means figuring out how to turn it off is all we're left with.
I'm genuinely not sure what the best mechanism for specifying turning it off would be. Using the case of the property seems elegant but a bit side-effect-y.