bill-ramos-rmoswi / babelfish_postgresql_towel

Bill Ramos's knowledge base for tips, tricks, and best practices for migrating SQL Server solutions to Babelfish for PostgreSQL and the open-source WiltonDB for Windows
Apache License 2.0
1 stars 1 forks source link

nondeterministic collations are not supported for LIKE #15

Open karthika-thirumoorthy opened 2 months ago

karthika-thirumoorthy commented 2 months ago

Hi @bill-ramos-rmoswi ,

we are trying to insert a value to employee table from entity framework but it throws an exception "nondeterministic collations are not supported for LIKE"

when we are trying to insert the value in SSMS it is inserted successfully.

INSERT INTO [dbo].[Employee] ([EmployeeGuid] ,[EmployeeId] ,[GivenName] ,[Surname] ,[MURAccredited] ,[NMSAccredited] ,[Deleted] ,[ServerSide] ,[DateAdded] ,[DateModified] ,[UserInRoleId]) VALUES (<EmployeeGuid, uniqueidentifier,> ,<EmployeeId, int,> ,<GivenName, varchar(35),> ,<Surname, varchar(100),> ,<MURAccredited, bit,> ,<NMSAccredited, bit,> ,<Deleted, bit,> ,<ServerSide, bit,> ,<DateAdded, datetime,> ,<DateModified, datetime,> ,<UserInRoleId, int,>) GO

staticlibs commented 2 months ago

@karthika-thirumoorthy

Collations are used in DB to define the rules to use when unicode strings comparison is done. Deterministic collation will consider two characters as equal if, and only if, they have the same byte sequence. So all deterministic collations must be case-sensitive (cs) and accent-sensitive (as). Postgres doesn’t support the LIKE clause on non-deterministic collations at all. Babelfish supports it for case-insensitive ci_as collations (doesn’t support on accent-insensitive ai collations).

By default Babelfish uses bbf_unicode_general_ci_as collation - case-insensitive UTF-8. I suggest to use this collation for all columns unless there are specific collation requirements for a particular column (need to be handled case-by-case).

If an ai collation is used in some column with EF like this:

create table tab1 (
    id int,
    col1 nvarchar(max) collate bbf_unicode_cp1_ci_ai
)
public class Tab1
{
    public int Id { get; set; }
    public string Col1 { get; set; }
}

Then this column cannot be used with EF queries that use LIKE under the hood, for example:

 var rec1 = db.Tab1
            .Where(r => r.Col1.StartsWith("foo"))
            .OrderBy(r => r.Id)
            .First();

is translated to:

SELECT TOP(1) [t].[Id], [t].[Col1]
FROM [Tab1] AS [t]
WHERE [t].[Col1] LIKE N'foo%'
ORDER BY [t].[Id]

and will fail on Babelfish with:

An unhandled exception of type 'Microsoft.Data.SqlClient.SqlException' occurred in Microsoft.EntityFrameworkCore.Relational.dll: 'nondeterministic collations are not supported for LIKE'
at Microsoft.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
[...]

To avoid this problem you need to either change the column collation to as one (bbf_unicode_general_ci_as in this example) or to specify the collation name explicitly in LINQ query like this:

var rec1 = db.Tab1
            .Where(r => EF.Functions.Collate(r.Col1, "bbf_unicode_cp1_ci_as")
                .StartsWith("foo"))
            .OrderBy(r => r.Id)
            .First();

Collations used for the particular table can be viewed with this query:

select name, collation_name
from sys.columns
where [object_id] = object_id('tab1')

All collation names supported by Babelfish can be listed with this query:

select name, description from sys.fn_helpcollations()
karthika-thirumoorthy commented 2 months ago

@staticlibs , We are getting this error while inserting the data in a table. Is there any chance to get this error on insert?

we are getting error on the below line,

dbContext.SaveChanges();

when I checked in DB for table Collation, it has a valid collation supported by Babelfish,

image

staticlibs commented 2 months ago

@karthika-thirumoorthy

Please show the problematic SQL query that is sent by EF to DB. You can get it from the DB server log after setting the config option log_statement = all:

01

karthika-thirumoorthy commented 2 months ago

@staticlibs , sorry for the confusion.

We found the cause for this issue. Below constraint cause the issue.

ALTER TABLE [dbo].[EmployeeIdentifier] WITH CHECK ADD CONSTRAINT [CK_EmployeeIdentifier_RegistrationNumber] CHECK (([EmployeeIdentifierTypeId]<>(1) OR NOT [Value] like '% %'))

could you please provide alternative solution for this constraint?

we done some workaround like updated the collation in the table as "Default" , Then it inserted the value without any error. Is that cause any other issue?

staticlibs commented 2 months ago

@karthika-thirumoorthy

ALTER TABLE [dbo].[EmployeeIdentifier] WITH CHECK ADD CONSTRAINT [CK_EmployeeIdentifier_RegistrationNumber] CHECK (([EmployeeIdentifierTypeId]<>(1) OR NOT [Value] like '% %'))

could you please provide alternative solution for this constraint?

In case of problems I can suggest to change

[Value] like '% %'

to

[Value] like '% %' collate bbf_unicode_general_ci_as

It should not be necessary, but in your case apparently there is some kind of mismatch.

we done some workaround like updated the collation in the table as "Default" , Then it inserted the value without any error. Is that cause any other issue?

I've re-checked this and there is some unclear logic in server and database default collation handling in Babelfish.

First, this point from myself:

By default Babelfish uses bbf_unicode_general_ci_as collation

appears to be incorrect, sql_latin1_general_cp1_ci_as is set by default instead in babelfishpg_tsql.server_collation_name server parameter (Aurora ref):

02

Then there are inconsistent collation names reported with the following queries:

create database test1 collate bbf_unicode_general_ci_as

select name, collation_name from sys.databases

use test1
create table tab1(col1 varchar(100) collate database_default)

select name, collation_name
from sys.columns
where [object_id] = object_id('tab1')

I've filed #61 to investigate this in WiltonDB and will follow-up there.

karthika-thirumoorthy commented 2 months ago

we tried the below collation

ALTER TABLE [dbo].[EmployeeIdentifier] WITH CHECK ADD CONSTRAINT [CK_EmployeeIdentifier_RegistrationNumber] CHECK (([EmployeeIdentifierTypeId]<>(1) OR NOT [Value] like '% %' collate bbf_unicode_general_ci_as)) GO

but got same error again. "nondeterministic collations are not supported for LIKE".

when we go through Aurora ref) article image

They Mentioned "Like" clause non-deterministic collation supports from babelfish 4.2.0 but I believe we currently using Babelfish version 3.3.0 in wilton DB. Is that cause any issue?

staticlibs commented 2 months ago

@karthika-thirumoorthy

From your previous message:

we done some workaround like updated the collation in the table as "Default" , Then it inserted the value without any error.

Does this workaround work for you, or is it causing any problems?

we tried the below collation

ALTER TABLE [dbo].[EmployeeIdentifier] WITH CHECK ADD CONSTRAINT [CK_EmployeeIdentifier_RegistrationNumber] CHECK (([EmployeeIdentifierTypeId]<>(1) OR NOT [Value] like '% %' collate bbf_unicode_general_ci_as)) GO

but got same error again. "nondeterministic collations are not supported for LIKE".

What I suppose is happening, is that the version of EF you use is including some collation into the generated SQL (perhaps, a server or a database default ones). Would it be possible for you to capture all the SQL (not only the check clause) that is being run when problem happens?

Also I can suggest to try to set the collation explicitly for EF in its initialization routines like this, because if this setting is being initialized with a server default (or DB default not clear now) - this can cause the mismatch you are getting.

It may be possible that this can be solved by setting the default server collation (babelfishpg_tsql.server_collation_name option) and the DBs default collation (specified when DB is created) to the bbf_unicode_general_ci_as, that supports LIKE and should work for all UTF-8 data. But I don't know yet how to make such settings effective (so it apply to all objects) and whether there are any unexpected site-effects from this.

They Mentioned "Like" clause non-deterministic collation supports from babelfish 4.2.0 but I believe we currently using Babelfish version 3.3.0 in wilton DB. Is that cause any issue?

Yes, LIKE support for ai collations was implemented recently in the very latest Babelfish. WiltonDB is based on Babelfish 3.3, it includes many fixes and enhancements backported from newer versions, but major features like this one are not feasible to backport.

psrinivasa2 commented 1 month ago

Hi Alex @staticlibs , Yes, LIKE support for ai collations was implemented recently in the very latest Babelfish - Does it mean it the issue will work in AWS Aurora? Recently we have reported few of babelfish issues and what is your guess on recent fixes on very latest babelfish?

JayakumarEMIS commented 1 month ago

Hi @staticlibs, as you mentioned we changed default server collation (babelfishpg_tsql.server_collation_name option) and the DBs default collation (specified when DB is created) to the bbf_unicode_general_ci_as, image (2)

eventhough we getting the same issue, we added sample script below for your reference, `SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE TABLE [dbo].[EmployeeIdentifier]( [PharmacistId] [int] NOT NULL, [PharmacistIdentifierTypeId] [tinyint] NOT NULL, [Value] varchar NOT NULL, CONSTRAINT [PK_PharmacistIdentifier_PharmacistId_PharmacistIdentifierTypeId] PRIMARY KEY CLUSTERED ( [PharmacistId] ASC, [PharmacistIdentifierTypeId] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] GO

ALTER TABLE [dbo].[EmployeeIdentifier] WITH CHECK ADD CONSTRAINT [CK_PharmacistIdentifier_RegistrationNumber] CHECK (([PharmacistIdentifierTypeId]<>(1) OR NOT [Value] like '% %')) GO`

INSERT [dbo].[EmployeeIdentifier]([PharmacistId], [PharmacistIdentifierTypeId], [Value]) VALUES ('28', '2', '')

we getting the issue while execute in sql itself, not in entity framework. image

And there is one question why can't we move our wilton db with latest version of babelfish(4.2), is there any difficulties to do this features, I can see in below documentation from Aurora PostgreSQL most of the inconsitency issue fix https://docs.aws.amazon.com/AmazonRDS/latest/AuroraPostgreSQLReleaseNotes/AuroraBabelfish.Updates.html#AuroraBabelfish.Updates.43X ?

staticlibs commented 1 month ago

@psrinivasa2

Does it mean it the issue will work in AWS Aurora?

According to Aurora docs the LIKE can be used on the subset of non-deterministic collations, so the problem won't happen if supported collation is used on Babelfish 4.2+.

staticlibs commented 1 month ago

@JayakumarEMIS

I will look deeper into this and will follow-up in #61.

psrinivasa2 commented 1 month ago

Hi @staticlibs , As mentioned by @JayakumarEMIS , I could see a statement in the document reference shared "Babelfish for Aurora PostgreSQL 4.2 adds several new features, enhancements, and fixes". Worth checking on taking babelfish version of WiltonDB to 4.2. Also you mentioned that you are frequently backporting fixes from babelfish recent version. If that is the case, all the recent fixes and features already in Wiltondb? If not what are the fixes and features from recent babelfish which are not yet backported into WiltonDB?

staticlibs commented 1 month ago

@psrinivasa2

WiltonDB-latest version is planned, but there are a number of open questions regarding its long-term maintenance. Thus WiltonDB 3.3 remains the main supported version.

Only security fixes and most critical bugfixes are backported to WiltonDB 3.3. Large changes are expensive to backport and frequently have a tail of follow-up regression fixes. For the full list of backports you need to check the commit history of wiltondb/babelfish_extensions repo.

staticlibs commented 1 month ago

I think I know what is the reason for "nondeterministic collations are not supported" error. In some scenarios when strings are passed to Postgres functions they retain the non-deterministic collation that is used in T-SQL. Because such collations are not supported in Postgres context with LIKE - it is necessary to specify the deterministic collation before calling Postgres function. I suggest to try to add something like collate C to strings used in constraint check in your case to verify that this is actually the root cause.

psrinivasa2 commented 1 month ago

Hi Alex, @JayakumarEMIS is able to arrive the workaround with different query to satisfy our business need. @JayakumarEMIS can you share the workaround details to Alex?

JayakumarEMIS commented 1 month ago

HI @staticlibs , for workaround removed like operator in that constraint and replaced CHARINDEX(' ', [Value]) = 0)) condition which satisfy our business need

Removed: ALTER TABLE [dbo].[EmployeeIdentifier] WITH CHECK ADD CONSTRAINT [CK_EmployeeIdentifier_RegistrationNumber] CHECK (([EmployeeIdentifierTypeId]<>(1) OR NOT [Value] like '% %'))

Replaced: ALTER TABLE [dbo].[EmployeeIdentifier] WITH CHECK ADD CONSTRAINT [CK_EmployeeIdentifier_RegistrationNumber] CHECK (([EmployeeIdentifierTypeId]<>(1) OR CHARINDEX(' ', [Value]) = 0))