denisenkom / go-mssqldb

Microsoft SQL server driver written in go language
BSD 3-Clause "New" or "Revised" License
1.82k stars 495 forks source link

Different result when insert into a nvarchar column between string and bytes #684

Closed lunny closed 3 years ago

lunny commented 3 years ago

The result is different when inserting into a nvarchar column. When giving a string parameter, it's right. But when giving a bytes parameter, the result is wrong.

sql.Exec("INSERT INTO table (nvarchar_col) VALUES (?)", stringParam) // this is right
sql.Exec("INSERT INTO table (nvarchar_col) VALUES (?)", bytesParam) // this is wrong
tc-hib commented 3 years ago

I think that is expected behavior. What is in bytesParam ? If it contains a LE UCS-2 string, it should work. If it contains a UTF-8 string, it should not.

You can try that in SSMS:

INSERT INTO table
VALUES
(N'é'),   -- OK (correctly typed parameter)
(0xC3A9), -- Not OK (UTF-8)
(0xE900)  -- OK (UCS-2)
lunny commented 3 years ago

I think that is expected behavior. What is in bytesParam ? If it contains a LE UCS-2 string, it should work. If it contains a UTF-8 string, it should not.

You can try that in SSMS:

INSERT INTO table
VALUES
(N'é'),   -- OK (correctly typed parameter)
(0xC3A9), -- Not OK (UTF-8)
(0xE900)  -- OK (UCS-2)

It's a utf8 string.

var stringParam = "data"
var bytesParam = []byte(stringParam)
tc-hib commented 3 years ago

The readme is a bit lacking. But I think this is good behavior: a []byte parameter should be sent as a varbinary.

lunny commented 3 years ago

The readme is a bit lacking. But I think this is good behavior: a []byte parameter should be sent as a varbinary.

But the column type is nvarchar, of course it's difficult to know which is the column type when execute a SQL.

tc-hib commented 3 years ago

Yes, but I'm afraid it's your responsibility to ensure your parameter has the correct type. The driver cannot arbitrarily convert your []byte from utf-8 to ucs-2, that would clearly be a bug. Same goes for other libraries, such as ADO.

In this case, I suppose sp_describe_undeclared_parameters (which cannot be implemented in the driver anyway, as it requires your schema) would guess the type, but in some queries it would be impossible, and so the behavior would be unpredictable.

lunny commented 3 years ago

It seems a workaround is always to use string and never use bytes even if it's a binary column.

tc-hib commented 3 years ago

No :) If you actually want to store 0xC3A9 in a varbinary, and you pass it as a string, it will become 0xE900. Say you want to store a png file in a column: don't pass it as a string, that would corrupt it.

lunny commented 3 years ago

No :) If you actually want to store 0xC3A9 in a varbinary, and you pass it as a string, it will become 0xE900. Say you want to store a png file in a column: don't pass it as a string, that would corrupt it.

Wow, thanks! I think this issue could be closed.