OrchardCMS / OrchardCore

Orchard Core is an open-source modular and multi-tenant application framework built with ASP.NET Core, and a content management system (CMS) built on top of that framework.
https://orchardcore.net
BSD 3-Clause "New" or "Revised" License
7.4k stars 2.39k forks source link

Postgre: Culture is case sensitive #14810

Open MikeKry opened 10 months ago

MikeKry commented 10 months ago

After migration from SQLite to PostgreSQL database, Culture became case sensitive.

I have found (so far) 2 issues: 1/ GraphQL now requires lowercased language - "en-gb" instead of "en-GB" 2/ culture filter in content items uses "en-GB" for search, but nothing is found unless you type in "en-gb"

not sure what is correct solution, but maybe culture should be stored in original format, not lowercase

hishamco commented 10 months ago

Please steps to reproduce

MikeKry commented 10 months ago

@hishamco Culture picker: 0/ run project on postgre 1/ Click on Culture picker in Content Items, select language 2/ after reload, list is empty image 3/ manually type in filter "cs-cz" 4/ results are visible image

GraphQL: 1/ same as for culture picker. Difference is only that case sensitivity is inside where condition. e.g. (where: {localization: {culture: "en-GB"}}) vs (where: {localization: {culture: "en-gb"}})

hishamco commented 10 months ago

How is this related to GraphQL?

MikeKry commented 10 months ago

I think it is not related to graphql at all, most likely it will be a problem either with case sensitivity in postgre or filter implementation for localization part

hishamco commented 10 months ago

I will test this today and push a PR for the fix

MikeKry commented 10 months ago

btw. this seems to apply to all searches - also text/title search is case sensitive

hishamco commented 10 months ago

Could try to create a COLLATION to avoid case sensitive checks

https://dba.stackexchange.com/questions/101294/how-to-create-postgres-db-with-case-insensitive-collation

MikeKry commented 10 months ago

@hishamco

Okay, I tried following:

CREATE COLLATION public.case_insensitive (provider = icu, locale = 'und-u-ks-level2', deterministic = false);
ALTER TABLE "ContentItemIndex" ALTER COLUMN "DisplayText" SET DATA TYPE character varying(255) COLLATE "case_insensitive";
ALTER TABLE "LocalizedContentItemIndex" ALTER COLUMN "Culture" SET DATA TYPE character varying(16) COLLATE "case_insensitive";

1/ culture picker (LocalizedContentItemIndex) behaves OK. 2/ displayText (ContentItemIndex) filtration breaks - PostgresException: 0A000: nondeterministic collations are not supported for LIKE.

I guess for search, we need to use "ILIKE" instead.

Select * from "ContentItemIndex" where "DisplayText" ILIKE '%footer%'
instead of
Select * from "ContentItemIndex" where "DisplayText" LIKE '%footer%'

There are some other options.. https://aws.amazon.com/blogs/database/manage-case-insensitive-data-in-postgresql/

@sebastienros Is this issue even related to OC, or should I move it to yessql repo?

MikeKry commented 10 months ago

@hishamco

I tried this CITEXT data type as mentioned in link to amazon above. https://www.postgresql.org/docs/current/citext.html#CITEXT-LIMITATIONS

CREATE EXTENSION IF NOT EXISTS citext;
ALTER TABLE "ContentItemIndex" ALTER COLUMN "DisplayText" SET DATA TYPE CITEXT COLLATE pg_catalog."default";

This seems to work also with LIKE, but you cannot use case sensitive searches when you use this datatype. I think it is not an issue as we use only case insensitive search in other database providers. Another problem might be no length limit and slower execution?

There might be another option using indexes... https://coderwall.com/p/6yhsuq/improve-case-insensitive-queries-in-postgres-using-smarter-indexes

sebastienros commented 9 months ago

I think we should handle any collation you decide to use, and this property should be stored normalized (lower-case for instance).

But if this is impacting too many features we might expect the collation to be case insensitive.

MikeKry commented 6 months ago

@sebastienros

So you suggest to normalize values to lowercase in application? I think that will have impact on all indexes - it has to be handled on every place where data is stored into index, so I guess mostly in index handlers + on every user input (search bar etc.) + might have impact on existing custom integrations e.g. via graphql.

I am okay with that, but it seems complicated at this point.

I would probably prefer to update yessql logic for postgre, as there are some other problems anyway. I have found out that also modifying column size breaks migrations in postgre, so it maybe should be revisited completely..

MikeKry commented 6 months ago

@sebastienros

Can you please provide a little guidance so I could implement it and create a PR?

sebastienros commented 5 months ago

For the culture code we should store it normalized, so everywhere we set it we should ToLowerInvariant() it. And the queries will have to do so too. We should also do a migration to convert all existing records.

For the DisplayText search it's a collation issue. If we use the field directly then there is nothing we can do in OC, just configure the DB with that in mind. Or use a default case-insensitive collation when creating the DB. Or if we want orchard to behave in a consistent way whatever the collation then we should do it in the ContentItemIndex too and the search queries.

If we need more than that then it's borderline a search engine issue and we should not use SQL queries for these kind of searches. (talking about DisplayText).

MikeKry commented 5 months ago

Okay, I will try to make PR on this