dotnet / SqlClient

Microsoft.Data.SqlClient provides database connectivity to SQL Server for .NET applications.
MIT License
842 stars 280 forks source link

Provide a reliable way to get SQL Server Collation for a column in SqlDataReader #1211

Open aersam opened 3 years ago

aersam commented 3 years ago

Is your feature request related to a problem? Please describe.

It is not currently possible to get the collation of a column out of a SqlDataReader. This was helpful as a followup of #1208 which allows getting raw bytes of a column and a collation would help interpret it. Also see https://stackoverflow.com/questions/35190858/how-to-get-the-collation-of-string-fields-in-a-result-set-via-sqldatareader

Describe the solution you'd like

I'd like some method to get this metadata. I am currently not sure which is the best way to do it, maybe extend GetColumnSchema or introduce a GetCollation method on SqlDataReader.

Describe alternatives you've considered

  1. Use database metadata Does not work with stored procedures. Though there is sp_describe_first_result_set, this is quite limited; eg Temporary tables do not work.

  2. Use some config from the user This is an option, but wrong config would lead to bugs that are very hard to find (as it may work if using characters from 0-127 only). Also, it's duplicate information as SqlClient need the Collation anyway.

cheenamalhotra commented 3 years ago

Hi @aersamkull

SQL Server's support for Collation is limited to only these types and doesn't apply to all datatypes: BIGCHARTYPE, BIGVARCHARTYPE, TEXTTYPE, NTEXTTYPE, NCHARTYPE, or NVARCHARTYPE as documented here: https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-tds/cbe9c510-eae6-4b1f-9893-a098944d430a

Due to this, the driver needs to make decisions internally to backfill applicable LCID, from locale or collation information available from server. It's possible because of this limited support, collation information is not exposed, as it's also not exposed by any other client driver for SQL Server.

Also would like to add, if not explored already, you can gather collation information using T-SQL and design your own application logic to handle specific cases: https://docs.microsoft.com/en-us/sql/relational-databases/collations/view-collation-information

aersam commented 3 years ago

Yes, it's available only to text columns and I do not expect it to return anything other then null or an exception for non-text columns.

I am not surprised that no other SQL Server client exposes this yet, as I expect this one to be the best driver ;) and also, it certainly is a low-level and rarely used feature. But in combination with #1208 has a lot of performance potential and as you can see in the Stackoverflow post is important enough for people to use reflection.

Db Metadata are really not an option for Stored Procedures. Just a short sample for a worst-cast proc:

CREATE PROC dbo.tester
@Param varchar(100)
aS
BEGIN
   IF @Param='1'
   BEGIN 
      cREATE TABLE #testTable(t varchar(100) COLLATE SQL_Latin1_General_CP1_CI_AS , #t2 varchar(100) collate Latin1_General_CI_AS)
     SELECT * FROM #testTable 
    END
ELSE IF @PAram='2'
bEGIN
 EXEC ('SELECT "sf" from "tester2"') AT [SOMESERVER};
END
 ELSE IF @Param='1'
   BEGIN 
      cREATE TABLE #testTable2(t varchar(100) /* Whats the tempdb collation? */ , #t2 varchar(100) collate  Latin1_General_CI_AS)
     SELECT * FROM #testTable2 
    END

END

You can't know the collation here until you execute it. The only way to get collation is within SqlDataReader itself

roji commented 3 years ago

@aersamkull it's important to note that this feature would only be useful in the following scenario:

A dynamic database programming where perf would be important enough to avoid an extra encoding conversion seems like a very unlikely scenario. So IMHO this proposal would be a micro-optimization for an edge case.

srutzky commented 3 years ago

@roji : It's really late so I don't have time to comment in as much detail as I would like, so I will add more tomorrow. But for the moment I will say that your first point (re: #1208) might be valid, but I don't believe your second point (re: column collation is usually known) is:

  1. Applications don't always dictate the collation of all columns and/or the DB itself. This has nothing to do with generic/dynamic DB programming.
  2. Speaking in terms of schema completely ignores:
    1. the ability to override column default collation via COLLATE
    2. expressions that might not even reference any columns
  3. Even if the desire for this info were limited to generic/dynamic DB programming, that doesn't make it any less valid than the thousands of other properties and methods available in framework libraries that many people never use. This is more accurately an infrequent case more than an edge case.

Also, the LCID and ComparisonStyle info should already be there and just need to be marked public. At least they are in the original SqlClient (I haven't checked to see if that code was duplicated here).

I hope this info helps, and again, I will add more details here and to #1208 later today / tomorrow.

roji commented 3 years ago

Applications don't always dictate the collation of all columns and/or the DB itself. This has nothing to do with generic/dynamic DB programming.

Apart from that, applications don't always dictate the DB schema/setup, but in my experience they do in most cases.

Even if the desire for this info were limited to generic/dynamic DB programming, that doesn't make it any less valid than the thousands of other properties and methods available in framework libraries that many people never use.

I never said it isn't valid, but prioritization of engineering resources time (and even conversation time!) must take into account the feature's value.

This is more accurately an infrequent case more than an edge case.

I'd personally estimate it at extremely infrequent, given that the need of reading strings as raw bytes has only been requested now, by one user, in the history of SqlClient, and the actual gains from avoiding the decoding is very unlikely to matter in all but the most high-perf applications. #1208 is a micro-optimization, and this issue is about making that micro-optimization usable in a wider range of scenarios.

Wraith2 commented 3 years ago

The use case for the information is very narrow. The information is already present in the query data for any recordset so we have it internally we just don't expose a way to get to it, adding a readonly property to get the LCID or equivalent form doesn't seem like a big problem.

At least they are in the original SqlClient (I haven't checked to see if that code was duplicated here).

I couldn't see them in the public api surface but I might have missed them. Logically they should be on SqlDataReader directly or on the GetColumnSchema result somewhere.

aersam commented 3 years ago

Just had a quick look at the source and it seems the Collation is an object with flags in the protocol, not really a string. If this is done I think one had to expose a new more user friendly immutable object, basically just this: https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-tds/3d29e8dc-218a-42c6-9ba4-947ebca9fd7e . With Encoding and SqlCompare Options added for simplicity

See also src/Microsoft.Data.SqlClient/netcore/src/Microsoft/Data/SqlClient/TdsParserHelperClasses.cs#L83