dotnet / SqlClient

Microsoft.Data.SqlClient provides database connectivity to SQL Server for .NET applications.
MIT License
838 stars 280 forks source link

SqlDataReader.GetSqlDateTime does not support DATETIME2 #846

Open bjornbouetsmith opened 3 years ago

bjornbouetsmith commented 3 years ago

Describe the bug

It seems like DATETIME2 sql server datatype is not fully or correctly implemented in SqlDataReader.

using method GetDateTime - it handles DATETIME2 correctly

using method GetSqlDateTime - it throws an System.InvalidCastException:

Exception message: Specified cast is not valid 
Stack trace:  at Microsoft.Data.SqlClient.SqlBuffer.get_SqlDateTime()

To reproduce

 [TestMethod]
        public void DateTime2Mapping()
        {
            string Statement = $@"SELECT CAST('2020-12-11 09:17:12.123456' AS DATETIME2) AS [MyDate]";

            var builder = new SqlConnectionStringBuilder()
            {
                InitialCatalog = "master",
                DataSource = @"any sql connection",
                IntegratedSecurity = true,
            };

            var sqlConnection = new SqlConnection(builder.ConnectionString);
            sqlConnection.Open();
            using (SqlCommand command = new SqlCommand(Statement))
            {
                command.CommandType = CommandType.Text;
                command.Connection = sqlConnection;
                command.CommandText = Statement;
                using (var reader = command.ExecuteReader())
                {
                    while (reader.Read())
                    {
                        var dtColDate = reader.GetSqlDateTime(0);
                        Assert.IsFalse(dtColDate.IsNull);
                        Console.WriteLine(dtColDate.Value);
                    }
                }
            }
        }

Expected behavior

I expect that GetSqlDateTime handles DATETIME2 and returns a SqlDateTime struct

Further technical details

Microsoft.Data.SqlClient version: 2.1.0 .NET target: Framework 4.7.2 SQL Server version: 15.0.4079.3 (2019) Operating system: Windows 10

Additional context Why I found this issue, is because I was profiling our code and saw that SqlDataReader.IsDbNull took up a lot of time since we need to check every column for null, since GetXX methods does not handle DBNull - and then I found SqlDataReader.GetSqlXX methods which handles DBNull.

Basically trying to get rid of the overhead in SqlDataReader

 override public bool IsDBNull(int i)
        {
            {
                CheckHeaderIsReady(columnIndex: i);

                SetTimeout(_defaultTimeoutMilliseconds);

                ReadColumnHeader(i);    // header data only
            }

            return _data[i].IsNull;
        }

by swithing to The IsNull property in the SqlTypes

i.e.

  public bool IsNull => !this.m_fNotNull;
JRahnama commented 3 years ago

@bjornbouetsmith DateTime2 cannot be casted as SqlDateTime. The source of the issue comes from the difference in DateTime2 and SqlDateTime. I tested some changes in the driver to convert the dates and since the DateTime2 Range is from 0001-01-01 through 9999-12-31 to an accuracy of 100 nanoseconds and SqlDateTime Range is January 1, 1753 to December 31, 9999 to an accuracy of 3.33 milliseconds and there is no exact conversion in C# to do so I got wrong results. The SqlDateTime property in SqlBuffer inside the driver casts the value to SqlDateTime and that throws the exception, However you can get the value using GetDateTime with no problem. In addition, the casting can happen in server side or in your command as explained here.

bjornbouetsmith commented 3 years ago

Hi, That might be true, but then you need a corresponding .NET struct that supports the extended range, i.e. a SqlDateTime2. Right now the support for DATETIME2 is by my opinion only partially implemented, since there are a lot of advantage to using the GetSqlXX methods instead of using the GetXX methods.

The former do not throw exceptions if the column is NULL, GetXX methods do - so I would still suggest that you either "Fix" SqlDateTime to handle the extended range of DATETIME2 - or implement a SqlDateTime2 to handle the DATETIME2 and then naturally add a GetSqlDateTime2 method.

When reading millions of rows from the database where every single column can be null, it is a major performance degredation to use GetXX methods instead of GetSqlXX methods, since you need to check every single column on ever single row with first IsDBNull and then do something depending on the return value. With GetSqlXX methods they never fail because the row is null and you can call the IsNull property directly on the return value, which is a more than an order of 10 times faster because of the overhead in IsDBNull.

Most of the code to handle DATETIME2 is already implemented in GetDateTime, so it should be only a matter of either implementing a GetSqlDateTime2 or fixing GetSqlDateTime to handle the extra range by extending SqlDateTime.

I would prefer you implemented a GetSqlDateTime2 - since that would complete the API, since there is a DATETIME2 data type in the database without a corresponding Sql class in .NET. Which would probably require adding a SqlDateTime2 property to the SqlBuffer struct.

I will be happy to submit a pull request to this effect.

And casting to DateTime is not really an option on the SQL server, since that would truncate the precision.

ErikEJ commented 3 years ago

There are APIs to get the nullability from a reader before calling the get methods, so you can avoid calling IsDbNull

bjornbouetsmith commented 3 years ago

@ErikEJ What API is that?

SqlDataReader.IsDbNull is what I am using, and that is not performant, since it does a lot of checks.

And I need to call for each field on every row, so I have found no API, except SqlDataReader.IsDbNull, Convert.IsDbNull and the IsNull property of the SqlXX types.

Convert.IsDbNull is useless, since that will BOX any value type, which will cause garbage, SqlDataReader is a performance HOG because it needs to check for the correct state of the reader before accessing the field. Which unless I have missed something leaves me with the IsNull property of the SqlXX types.

I am not interested in an API that can tell me whether or not a column "can" be null - I already know ALL columns can be null, but a performant way to check if any field is null.

ErikEJ commented 3 years ago

GetColumnSchema or GetSchemaTable

bjornbouetsmith commented 3 years ago

@ErikEJ Yes, but that will only tell me the schema of the table, which I don't need and cannot use - I already know all columns can be null - I need to handle null values because SqlClient will throw exception if the field is null. Where the GetSqlXX methods will not throw exceptions, but instead return a struct with a IsNull property. So the only real way to manage null values if you care about performance and garbage collections.

Wraith2 commented 3 years ago

Can you provide an isolated test case that exhibits the behaviour you see around null checks and direct type access causing more garbage than using the sql types. I've always assumed when working on the internals that the sql types would be more costly to use but if they're not I can investigate if there are ways to equalize the performance of the paths.

bjornbouetsmith commented 3 years ago

@Wraith2 I never said my issue was around garbage - but only with performance, since the SqlClient is missing a SqlDateTime2 or equivalent class to represent DATETIME2.

if you use DATETIME2 - you are forced to use SqlDataReader.GetDateTime - and since that method does not gracefully handle null, you are then forced to first call SqlDataReader.IsDbNull.

The codepath for IsDbNull does a lot of work, which is required because it needs to check if the state is correct to be able to check for null. And then it ends up by accessing a property on SqlBuffer which is basically just a field accessor telling it whether or not data is null. If there was a SqlDateTime2 struct/GetSqlDateTime2, I would not have to call SqlDataReader.IsDbNull and then GetSqlDateTime - I could just use the performant way of just retrieving the SqlDateTime2 struct and call the IsNull property and if false, return the data contained in the struct.

using my "testcase" from above - I am sure you will be able to see the issue running this code below. Of course the code below is silly, but imagine it being a 200 column row all having the possiblity of null and having 50k rows.

for (int x = 0; x < 10000000; x++)
{
    DateTime dateTime;
    if (!reader.IsDBNull(0))
    {
        // Do something
        dateTime = reader.GetDateTime(0);
    }
    else
    {
        dateTime = default;
    }

}
//vs below code which fails with an InvalidCastException
// so you have to imagine calling a GetSqlDateTime2 which returned a SqlDateTime2
// or imagine that SqlDateTime would be able to handle the extended range of DATETIME2 and not throw exceptions

for (int x = 0; x < 10000000; x++)
{
    DateTime dateTime;
    SqlDateTime sqlDateTime = reader.GetSqlDateTime(0);
    if (!sqlDateTime.IsNull)
    {
        dateTime = sqlDateTime.Value;
    }
    else
    {
        dateTime = default;
    }

}
roji commented 3 years ago

@bjornbouetsmith

When reading millions of rows from the database where every single column can be null, it is a major performance degredation to use GetXX methods instead of GetSqlXX methods, since you need to check every single column on ever single row with first IsDBNull and then do something depending on the return value.

Is this something you're actually seeing in production, or have benchmarked? Because a quick benchmark seems to show that the perf of GetSqlDateTime is the same as IsDBNull + GetDateTime (see below). After all, I'd expect GetSqlDateTime to have to do more or less the same null checking internally as you're doing externally when calling GetDateTime.

Benchmark results:


BenchmarkDotNet=v0.12.1, OS=ubuntu 20.04
Intel Core i7-9750H CPU 2.60GHz, 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=5.0.101
  [Host]     : .NET Core 3.1.10 (CoreCLR 4.700.20.51601, CoreFX 4.700.20.51901), X64 RyuJIT
  DefaultJob : .NET Core 3.1.10 (CoreCLR 4.700.20.51601, CoreFX 4.700.20.51901), X64 RyuJIT
Method IsNull Mean Error StdDev
GetDateTime False 139.7 us 2.68 us 4.48 us
GetSqlDateTime False 137.2 us 2.73 us 5.58 us
GetDateTime True 124.4 us 2.38 us 3.25 us
GetSqlDateTime True 124.2 us 2.45 us 3.35 us
Benchmark code ```c# public class Program { const string ConnectionString = @"Server=localhost;Database=test;User=SA;Password=Abcd5678;Connect Timeout=60;ConnectRetryCount=0"; [Params(true, false)] public bool IsNull { get; set; } SqlConnection _connection; SqlCommand _command; [GlobalSetup] public void Setup() { _connection = new SqlConnection(ConnectionString); _connection.Open(); _command = new SqlCommand(IsNull ? "SELECT CAST(NULL AS datetime)" : "SELECT CAST('2020-01-01' AS datetime)", _connection); } [GlobalCleanup] public void Cleanup() { _connection.Dispose(); _command.Dispose(); } [Benchmark] public DateTime GetDateTime() { using var reader = _command.ExecuteReader(); reader.Read(); return reader.IsDBNull(0) ? default : reader.GetDateTime(0); } [Benchmark] public SqlDateTime GetSqlDateTime() { using var reader = _command.ExecuteReader(); reader.Read(); return reader.GetSqlDateTime(0); } static void Main(string[] args) => BenchmarkRunner.Run(); } ```
ErikEJ commented 3 years ago

Should it not be:

      _command = new SqlCommand(IsNull
            ? "SELECT CAST(NULL AS datetime2)"
            : "SELECT CAST('2020-01-01' AS datetime2)",
bjornbouetsmith commented 3 years ago

@ErikEJ Yes that would have been great - except using GetSqlDateTime does not support DATETIME2 - which is my whole problem.

bjornbouetsmith commented 3 years ago

@roji Yes I have profiled this and could see that IsDbNull was the method using the most time.

And I have no idea how that benchmark you made runs, but if you call ExecuteReader, and only read one field from that reader, the isolated performance of IsDbNull will drown in the connection overhead etc.

So I would suggest if you want to test like that, that you open the reader once, and then call reader.IsDbNull/GetDateTime 10million times inside a loop and do the same with GetSqlDateTime.

that would give you the performance testing in question - and only measure the time of reader.IsDbNull/GetDateTime - not the time it takes to open the connection and execute the query.

Using this simple test shows a huge performance difference on my machine:

IsDbNull:00:00:04.1302114 GetSqlDateTime:00:00:02.6796905 IsDbNull:00:00:04.1251925 GetSqlDateTime:00:00:02.7111852

And reading 10million+ very wide rows where all fields can be null for performance reasons, then yes it makes a big difference.

I considered "fixing" the issue by making my stored procedure use ISNULL, but that would defeat the purpose of having null for performance reasons, since then suddenly the sql server would have to transfer a value instead of just a bit stating the field was null - so we would use more bandwith/time just reading the data. So I am still convinced that having a GetSqlDateTime method that either handles DATETIME2 - or having a GetSqlDateTime2 method is the correct solution.

string Statement = $@"SELECT CAST('2020-12-11 09:17:12.123' AS DATETIME) AS [MyDate]";

var builder = new SqlConnectionStringBuilder()
{
    InitialCatalog = "master",
    DataSource = @"(LocalDb)\AggService",
    IntegratedSecurity = true,
};
int count=100000000;
var sqlConnection = new SqlConnection(builder.ConnectionString);
sqlConnection.Open();
using (SqlCommand command = new SqlCommand(Statement))
{
    command.CommandType = CommandType.Text;
    command.Connection = sqlConnection;
    command.CommandText = Statement;
    using (var reader = command.ExecuteReader())
    {
        while (reader.Read())
        {
            var dummy=reader.GetDateTime(0);
            var dummy2 = reader.GetSqlDateTime(0);
            Stopwatch sw = Stopwatch.StartNew();

            for (int x = 0; x < count; x++)
            {
                DateTime dt;
                if (!reader.IsDBNull(0))
                {
                    dt = reader.GetDateTime(0);
                }
                else
                {
                    dt = default;
                }
            }
            sw.Stop();
            Console.WriteLine("IsDbNull:{0}",sw.Elapsed);
            sw = Stopwatch.StartNew();
            for (int x = 0; x < count; x++)
            {

                var sqlDt = reader.GetSqlDateTime(0);
                DateTime dt;
                if (!sqlDt.IsNull)
                {
                    dt = sqlDt.Value;
                }
                else
                {
                    dt = default;
                }
            }
            sw.Stop();
            Console.WriteLine("GetSqlDateTime:{0}",sw.Elapsed);
            sw = Stopwatch.StartNew();

            for (int x = 0; x < count; x++)
            {
                DateTime dt;
                if (!reader.IsDBNull(0))
                {
                    dt = reader.GetDateTime(0);
                }
                else
                {
                    dt = default;
                }
            }
            sw.Stop();
            Console.WriteLine("IsDbNull:{0}",sw.Elapsed);
            sw = Stopwatch.StartNew();
            for (int x = 0; x < count; x++)
            {

                var sqlDt = reader.GetSqlDateTime(0);
                DateTime dt;
                if (!sqlDt.IsNull)
                {
                    dt = sqlDt.Value;
                }
                else
                {
                    dt = default;
                }
            }
            sw.Stop();
            Console.WriteLine("GetSqlDateTime:{0}",sw.Elapsed);
        }
    }
}
Wraith2 commented 3 years ago

I'd expect GetSqlDateTime to have to do more or less the same null checking internally as you're doing externally when calling GetDateTime.

That's my expectation based on the code as well. The first field access on the row will create the SqlBuffer and then it's used by the subsequent access. the Sql versions should actually do more work because they're creating an additional object rather than passing back data. This will mean that profiles show most of the time spend in IsDbNull because that's where the value is resolved and it changes the Get call to a simple lookup, however the overall time shared between them should differ only in the extra time taken to setup callstacks and do duplicated arg checking.

The Sql* types are considered legacy currently as far as I'm aware. They're from .net 1.0 when there were no generics and thus no way to express struct nullability, mostly it was for DataTable as far as I know. Because the types are defined in terms of SQL Server's capabilities other providers have to ignore them, provide their own conversions or provider replacements. It seems unlikely that there will be a new type added to the runtime (dotnet/runtime) at this point but it might be possible to add it directly to sqlclient. I'd rather make the check-null-then-maybe-get pattern faster if it's causing a perf problem in comparison to the Sql type version.

Idealogically I think the separation of null checking and field getting is a purer mapping of the distinction between nulls and values in SQL than a nullable clr type, I consider sql nulls "stronger" than language nulls because of their untyped and uncomparable nature.

bjornbouetsmith commented 3 years ago

@Wraith2 I was not aware that the SqlTypes were considered legacy - if that is the case, I would suggest they would be marked as obsolete - and if they are not obsolete, they should really be extended with support for the new data types in SQL Server.

If you can fix the performance issue of IsDbNull - or handle null better in GetXX methods - I am fine by that.

But it seems "strange" that an API was chosen that cannot handle null and requires that caller has to call another method to check for null

the GetSqlXX methods is much easier to use and are more equivalent to what you would expect in my opinion from something that reads from a database.

Imagine SQL server not being able to return NULL from a column and you had to always wrap every column in a ISNULL check before accessing it.

Getting a value back without having to check for null first, will always perform better than having to check for null and then going through almost the same work again just to get the value.

Compare the two methods:

 virtual public SqlDateTime GetSqlDateTime(int i)
        {
            ReadColumn(i);
            return _data[i].SqlDateTime;
        }

        /// <include file='../../../../../../../doc/snippets/Microsoft.Data.SqlClient/SqlDataReader.xml' path='docs/members[@name="SqlDataReader"]/GetDateTime/*' />
        override public DateTime GetDateTime(int i)
        {
            ReadColumn(i);

            DateTime dt = _data[i].DateTime;
            // This accessor can be called for regular DateTime column. In this case we should not throw
            if (_typeSystem <= SqlConnectionString.TypeSystem.SQLServer2005 && _metaData[i].IsNewKatmaiDateTimeType)
            {
                // TypeSystem.SQLServer2005 or less

                // If the above succeeds, then we received a valid DateTime instance, now we need to force
                // an InvalidCastException since DateTime is not exposed with the version knob in this setting.
                // To do so, we simply force the exception by casting the string representation of the value
                // To DateTime.
                object temp = (object)_data[i].String;
                dt = (DateTime)temp;
            }

            return dt;
        }

Almost the same work in both methods - both call ReadColumn and both return data from the SqlBuffer - the difference is that the DateTime property throws an exception if the field is null, where the SqlDateTime returns a struct that can tell you if the field was null - similar to a nullable datetime.

A much better API in my opinion

SqlDateTime takes up

        private bool m_fNotNull;    // false if null
        private int m_day;      // Day from 1900/1/1, could be negative. Range: Jan 1 1753 - Dec 31 9999.
        private int m_time;     // Time in the day in term of ticks

Where Datetime takes up:

       private UInt64 dateData;

So almost similar storage requirements in memory.

A solution could be to return nullable types from a range of new methods, since a Nullable should perform similar to the SqlDateTime, since it has:

 private bool hasValue; 
  internal T value;

i.e. a full range of:

GetNullableXX that would simply do the same as the normal methods, but instead of blindly accessing the "Value" field of SqlBuffer it would first call IsNull, i.e.

override public DateTime? GetNullableDateTime(int i)
{
    ReadColumn(i);
    if (_data[i].IsNull) 
    {
        return (DateTime?)null;
    }
    DateTime dt = _data[i].DateTime;
    // This accessor can be called for regular DateTime column. In this case we should not throw
    if (_typeSystem <= SqlConnectionString.TypeSystem.SQLServer2005 && _metaData[i].IsNewKatmaiDateTimeType)
    {
        // TypeSystem.SQLServer2005 or less

        // If the above succeeds, then we received a valid DateTime instance, now we need to force
        // an InvalidCastException since DateTime is not exposed with the version knob in this setting.
        // To do so, we simply force the exception by casting the string representation of the value
        // To DateTime.
        object temp = (object)_data[i].String;
        dt = (DateTime)temp;
    }

    return dt;
}
roji commented 3 years ago

But it seems "strange" that an API was chosen that cannot handle null and requires that caller has to call another method to check for null

I'd recommend not confusing the discussion around DATETIME2 support and SqlClient's SqlTypes with a general discussion on how ADO.NET behaves around nulls - this is simply how the .NET database API works. I wouldn't consider this behavior around nulls a good reason to introduce a GetSqlDateTime2.

The ADO.NET behavior around nulls was a design decision for the entire API (regardless of what we think about it today) - database nulls are generally not represented by .NET nulls, but rather by DBNull (for generic methods this is a problem). There's some relevant discussion on this in https://github.com/dotnet/runtime/issues/26511, even if that's about a proposed generic ExecuteScalar, rather than readere methods.

In any case, as there isn't a performance aspect here, it's trivial to write a simple GetNullable<T>() extension method yourself, which performs the IsDBNull internally and returns null accordingly.

bjornbouetsmith commented 3 years ago

@roji What do you mean there isn't a performance aspect?

Did you see my test code and the results I posted? going from 4.1 seconds to 2.7 is a huge difference. The test you wrote are next to useless, since it has the entire overhead of the ExecuteReader and Read - if you moved those calls to the "setup" part of the test, then perhaps it would be equivalent to the test I made.

I cannot just write a GetNullable method, since that would help me exactly nothing, since I would still need to call IsDbNull, which is the entire reason for this post - IsDbNull is SLOW by design - since it has to check for metadata etc before accessing the field - and then you have to do a lot of the work again by reading the data.

Rewrite your test, to only call the methods being discussed. IsDbNull + GetDateTime vs. GetSqlDateTime - and I am sure you will se similar performance difference than I did.

Wraith2 commented 3 years ago

IsDbNull is SLOW by design

No, it isn't. As I explained earlier the first one one of IsDBNull or GetField will cause the sqlbuffer entry to be created and will show the majority of the cost.

The benchmark @Roji provided compares: return reader.IsDBNull(0) ? default : reader.GetDateTime(0); return reader.GetSqlDateTime(0); which are the two patterns we're discussing so the numbers are representative. Though they favour the GetSqlDateTime because they don't include the time you'll have to take to do the IsNull check yourself.

As I said I'll try and find time to check your specific scenario.

There is a difference between legacy and obsolete. Obsolete means you shouldn't use it and there is usually a replacement. Legacy means it's old and it's not obsolete but it's not likely to be expanded. DataTable and the associated items supporting it in System.Data fit into the legacy category as far as I know, they work they're supported but they're unlikely to change. Does that sound right @Roji?

bjornbouetsmith commented 3 years ago

@Wraith2 it does indeed compare the methods you write, but it also includes the time it takes to call ExecuteReader and Read - and the majority of the time in those tests will probably be from ExecuteReader & Read - so that is why I still think the test is testing more than it should - the time taken by IsDbNull+GetDateTime drowns in the time taken by ExecuteReader & Read.

If you add in a Thread.Sleep(10000) into the same test, it also tests reader.IsDbNull - but the time taken will drown - which is why it should be rewritten to only measure the methods in question - not measure code that is part of the setup.

Which is what my test did, except not using a fancy test framework.

In regards to Legacy, Do you have any links that say they are legacy or is it just guesses or hearsay?

And if they are legacy I think it is a shame, since they provide a better API in my opinion.

Wraith2 commented 3 years ago

Ok, can you provide a benchmark.net example that shows what you're saying?

bjornbouetsmith commented 3 years ago

There you go.

Point proven - almost 100% faster to use GetSqlDateTime when value is not null, and a bit slower when value is null. which is surprising, but possibly just because DateTime is a smaller struct than SqlDateTime.

Edited with string and int64 test as well.

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19042 AMD Ryzen 7 1700, 1 CPU, 16 logical and 8 physical cores [Host] : .NET Framework 4.8 (4.8.4250.0), X64 RyuJIT DefaultJob : .NET Framework 4.8 (4.8.4250.0), X64 RyuJIT

Method NullValue Mean Error StdDev
GetDateTime False 58.30 ns 0.623 ns 0.583 ns
GetSqlDateTime False 30.53 ns 0.022 ns 0.019 ns
GetString False 51.80 ns 1.086 ns 1.486 ns
GetSqlString False 63.34 ns 0.177 ns 0.157 ns
GetInt64 False 42.63 ns 0.137 ns 0.107 ns
GetSqlInt64 False 24.63 ns 0.142 ns 0.133 ns
GetDateTime True 24.59 ns 0.123 ns 0.115 ns
GetSqlDateTime True 27.34 ns 0.130 ns 0.122 ns
GetString True 26.30 ns 0.291 ns 0.258 ns
GetSqlString True 32.51 ns 0.067 ns 0.060 ns
GetInt64 True 24.58 ns 0.094 ns 0.073 ns
GetSqlInt64 True 26.80 ns 0.088 ns 0.078 ns
  public class Program
    {
        const string ConnectionString = @"Server=(localdb)\MSSQLLocalDB;Database=master;Integrated Security=true;Connect Timeout=3;ConnectRetryCount=1";

        [Params(true, false)]
        public bool NullValue { get; set; }

        SqlConnection _connection;
        SqlCommand _command;
        SqlDataReader _reader;

        [GlobalSetup]
        public void Setup()
        {
            _connection = new SqlConnection(ConnectionString);
            _connection.Open();
            _command = new SqlCommand(@"
SELECT CAST(NULL AS datetime) as [MyDT], CAST(NULL AS NVARCHAR(10)) as [MyStr], CAST(NULL AS BIGINT) as [MyBigInt] 
UNION 
SELECT CAST('2020-01-01 01:01:01.123' AS datetime) as [MyDT], CAST('AbCdEfGhIj' AS NVARCHAR(10)) as [MyStr], CAST(4611686018427387903 AS BIGINT) as [MyBigInt]  ",
                _connection);
            _reader = _command.ExecuteReader();
            _reader.Read();
            if (!NullValue)
            {
                _reader.Read();
            }
        }

        [GlobalCleanup]
        public void Cleanup()
        {
            _connection.Dispose();
            _command.Dispose();
            _reader.Close();

        }

        [Benchmark]
        public DateTime GetDateTime()
        {
            return _reader.IsDBNull(0) ? default : _reader.GetDateTime(0);
        }

        [Benchmark]
        public SqlDateTime GetSqlDateTime()
        {
            return _reader.GetSqlDateTime(0);
        }

        [Benchmark]
        public string GetString()
        {
            return _reader.IsDBNull(1) ? default : _reader.GetString(1);
        }

        [Benchmark]
        public SqlString GetSqlString()
        {
            return _reader.GetSqlString(1);
        }

        [Benchmark]
        public long GetInt64()
        {
            return _reader.IsDBNull(2) ? default : _reader.GetInt64(2);
        }

        [Benchmark]
        public SqlInt64 GetSqlInt64()
        {
            return _reader.GetSqlInt64(2);
        }

        static void Main(string[] args)
            => BenchmarkRunner.Run<Program>();
    }
bjornbouetsmith commented 3 years ago

So it seems like when the value in a field is null - the IsDbNull + default is the fastest. When there are data, its mostly faster to use GetSqlXX methods instead of IsDbNull/GetXX method.

structs/primitives seems faster as well - but I would be happy to write and run a full test of all the built in .NET types so we can get the full picture.

But I am guessing the following pattern will emerge.

structs/primitives will be faster with GetSqlXX methods, objects will be faster with GetXX methods - but it probably depends on the implementation of the SqlTypes struct used how much overhead there are in that struct.

This is probably why SqlString is so slow:

    private string m_value;
    private CompareInfo m_cmpInfo;
    private int m_lcid;
    private SqlCompareOptions m_flag;
    private bool m_fNotNull;

plus it converts a byte array to a string in the constructor - where GetString seems to do the equivelant of casting an object to a string

bjornbouetsmith commented 3 years ago

With a few more data types tested:

Method NullValue Mean Error StdDev
GetDateTime False 59.33 ns 0.270 ns 0.252 ns
GetSqlDateTime False 30.73 ns 0.119 ns 0.112 ns
GetString False 49.63 ns 0.198 ns 0.176 ns
GetSqlString False 63.29 ns 0.206 ns 0.182 ns
GetInt64 False 44.11 ns 0.156 ns 0.138 ns
GetSqlInt64 False 24.57 ns 0.172 ns 0.161 ns
GetDouble False 45.96 ns 0.487 ns 0.456 ns
GetSqlDouble False 31.63 ns 0.119 ns 0.100 ns
GetInt32 False 46.11 ns 0.104 ns 0.093 ns
GetSqlInt32 False 24.79 ns 0.040 ns 0.033 ns
GetInt16 False 45.42 ns 0.199 ns 0.187 ns
GetSqlInt16 False 24.87 ns 0.081 ns 0.072 ns
GetBoolean False 44.91 ns 0.111 ns 0.092 ns
GetSqlBoolean False 25.57 ns 0.371 ns 0.329 ns
GetGuid False 55.84 ns 0.392 ns 0.367 ns
GetSqlGuid False 24.20 ns 0.120 ns 0.106 ns
GetBytes False 136.00 ns 0.837 ns 0.699 ns
GetSqlBytes False 71.30 ns 0.293 ns 0.274 ns
GetDateTime True 24.42 ns 0.063 ns 0.059 ns
GetSqlDateTime True 27.13 ns 0.046 ns 0.043 ns
GetString True 26.57 ns 0.046 ns 0.041 ns
GetSqlString True 32.24 ns 0.068 ns 0.063 ns
GetInt64 True 24.30 ns 0.070 ns 0.062 ns
GetSqlInt64 True 26.97 ns 0.043 ns 0.038 ns
GetDouble True 22.74 ns 0.055 ns 0.051 ns
GetSqlDouble True 27.81 ns 0.467 ns 0.436 ns
GetInt32 True 24.28 ns 0.101 ns 0.095 ns
GetSqlInt32 True 27.06 ns 0.041 ns 0.034 ns
GetInt16 True 24.39 ns 0.094 ns 0.083 ns
GetSqlInt16 True 26.73 ns 0.080 ns 0.071 ns
GetBoolean True 24.26 ns 0.071 ns 0.059 ns
GetSqlBoolean True 26.21 ns 0.032 ns 0.027 ns
GetGuid True 24.65 ns 0.113 ns 0.100 ns
GetSqlGuid True 24.11 ns 0.081 ns 0.072 ns
GetBytes True 128.20 ns 0.603 ns 0.503 ns
GetSqlBytes True 65.56 ns 0.386 ns 0.361 ns

And code:

 public class Program
    {
        const string ConnectionString = @"Server=(localdb)\MSSQLLocalDB;Database=master;Integrated Security=true;Connect Timeout=3;ConnectRetryCount=1";

        [Params(true, false)]
        public bool NullValue { get; set; }

        SqlConnection _connection;
        SqlCommand _command;
        SqlDataReader _reader;
        private byte[] _buffer;

        [GlobalSetup]
        public void Setup()
        {
            _connection = new SqlConnection(ConnectionString);
            _connection.Open();
            _command = new SqlCommand(@"
SELECT CAST(NULL AS datetime) as [MyDT], CAST(NULL AS NVARCHAR(10)) as [MyStr], CAST(NULL AS BIGINT) as [MyBigInt],CAST(NULL as FLOAT) as [MyFL],CAST(NULL as INT) as [MyInt],CAST(NULL as SMALLINT) as [MySInt],CAST(NULL AS BIT) as [myBit],cast(NULL AS uniqueidentifier) as [MyId],CAST(NULL AS timestamp) AS [MyTS]
UNION 
SELECT CAST('2020-01-01 01:01:01.123' AS datetime) as [MyDT], CAST('AbCdEfGhIj' AS NVARCHAR(10)) as [MyStr], CAST(4611686018427387903 AS BIGINT) as [MyBigInt],CAST(46116860184.27387903 as FLOAT) as [MyFL],CAST(1234567 as INT) as [MyInt],CAST(1234 as SMALLINT) as [MySInt],CAST(0 AS BIT) as [myBit],cast(NEWID() AS uniqueidentifier) as [MyId],CAST(CURRENT_TIMESTAMP AS timestamp) AS [MyTS]",
                _connection);
            _reader = _command.ExecuteReader();
            _reader.Read();
            if (!NullValue)
            {
                _reader.Read();
            }
        }

        [GlobalCleanup]
        public void Cleanup()
        {
            _connection.Dispose();
            _command.Dispose();
            _reader.Close();

        }

        [Benchmark]
        public DateTime GetDateTime()
        {
            return _reader.IsDBNull(0) ? default : _reader.GetDateTime(0);
        }

        [Benchmark]
        public SqlDateTime GetSqlDateTime()
        {
            return _reader.GetSqlDateTime(0);
        }

        [Benchmark]
        public string GetString()
        {
            return _reader.IsDBNull(1) ? default : _reader.GetString(1);
        }

        [Benchmark]
        public SqlString GetSqlString()
        {
            return _reader.GetSqlString(1);
        }

        [Benchmark]
        public long GetInt64()
        {
            return _reader.IsDBNull(2) ? default : _reader.GetInt64(2);
        }

        [Benchmark]
        public SqlInt64 GetSqlInt64()
        {
            return _reader.GetSqlInt64(2);
        }

        [Benchmark]
        public double GetDouble()
        {
            return _reader.IsDBNull(3) ? default : _reader.GetDouble(3);
        }

        [Benchmark]
        public SqlDouble GetSqlDouble()
        {
            return _reader.GetSqlDouble(3);
        }

        [Benchmark]
        public int GetInt32()
        {
            return _reader.IsDBNull(4) ? default : _reader.GetInt32(4);
        }

        [Benchmark]
        public SqlInt32 GetSqlInt32()
        {
            return _reader.GetSqlInt32(4);
        }

        [Benchmark]
        public short GetInt16()
        {
            return _reader.IsDBNull(5) ? default : _reader.GetInt16(5);
        }

        [Benchmark]
        public SqlInt16 GetSqlInt16()
        {
            return _reader.GetSqlInt16(5);
        }

        [Benchmark]
        public bool GetBoolean()
        {
            return _reader.IsDBNull(6) ? default : _reader.GetBoolean(6);
        }

        [Benchmark]
        public SqlBoolean GetSqlBoolean()
        {
            return _reader.GetSqlBoolean(6);
        }

        [Benchmark]
        public Guid GetGuid()
        {
            return _reader.IsDBNull(6) ? default : _reader.GetGuid(7);
        }

        [Benchmark]
        public SqlGuid GetSqlGuid()
        {
            return _reader.GetSqlGuid(7);
        }

        [Benchmark]
        public byte[] GetBytes()
        {
            byte[] buffer = _buffer??=new byte[8];
            if (_reader.IsDBNull(8)) 
            {
                return null;
            }

            _reader.GetBytes(8, 0, buffer, 0, 8);
            return buffer;
        }

        [Benchmark]
        public SqlBytes GetSqlBytes()
        {
            return _reader.GetSqlBytes(8);
        }

        static void Main(string[] args)
            => BenchmarkRunner.Run<Program>();
    }
bjornbouetsmith commented 3 years ago

So far my prediction holds, primitives/structs are faster using GetSqlXX methods.

roji commented 3 years ago

@bjornbouetsmith there are many things above, so I'll try to answer them one by one.

And reading 10million+ very wide rows where all fields can be null for performance reasons, then yes it makes a big difference.

That is quite a problematic way to show that something has a performance problem in any real-world scenario; yes, method A may be negligibly slower than method B (e.g. by a few nanoseconds), but that doesn't mean it's significant. It's always possible to devise a benchmark which would amplify that difference, e.g. by simulating a 10 million-column row (something which isn't even supported by SQL Server AFAIK), but that doesn't necessarily have any real meaning to anyone in the real world.

This is why I asked above whether you've seen an actual perf problem in production, as opposed to in an artificially-crafted benchmark.

The test you wrote are next to useless [...]

This is a respectful discussion with differing opinions - I'd avoid calling the other person's benchmark "next to useless". My benchmark tests exactly what it tests: the difference for a single query retrieving a single value; ExecuteReader and Read are the same in both methods, so they should be constant. Regardless, what you consider a real-world usage is definitely a matter for discussion, but 10 million+ wide rows definitely seems further away from reasonable than what a query fetching a single value.

Looking at your results, I'm seeing differences between almost 0 to around 30ns (the bytes case is an interesting outlier). It's also worth keeping in mind that our benchmarks are being executed against localhost, meaning that there is almost zero latency involved; I'd guess that in any reasonable scenario where a remote database is being accessed (which is most real-world cases), these differences are going to be drowned out. That should of course be confirmed via further benchmarking.

But stepping back for a second... The question here is whether the difference between IsDBNull + GetWhatever and GetSqlWhatever is something that justifies prioritizing optimization efforts - compared to other possible performance work on SqlClient. My personal opinion is that it isn't. But even if it were the case, rather than introducing a new GetSqlDateTime2, efforts could be concentrated on simply making IsDBNull + GetWhatever faster. First, there's no reason to believe that's not possible; second, GetDateTime is the standard ADO.NET API, as opposed to GetSqlDateTime which is SqlClient-specific and so therefore not available to a database-agnostic application.

IsDbNull is SLOW by design - since it has to check for metadata etc before accessing the field - and then you have to do a lot of the work again by reading the data.

To reiterate, there's nothing about IsDBNull that's slow by design. The ADO.NET API (which is what defines IsDBNull) says nothing about implementation, or about when any metadata gets checked. And again, at least conceptually there's nothing IsDBNull does which wouldn't need to be done by GetSqlDateTime internally.

roji commented 3 years ago

There is a difference between legacy and obsolete. Obsolete means you shouldn't use it and there is usually a replacement. Legacy means it's old and it's not obsolete but it's not likely to be expanded. DataTable and the associated items supporting it in System.Data fit into the legacy category as far as I know, they work they're supported but they're unlikely to change. Does that sound right @roji?

I'm not sure about the exact status of the SqlTypes. Re DataTable/DataSet specifically - which is part of ADO.NET rather than SqlClient - I'd say they're more "archived" than obsolete/legacy. That is, generally speaking no further improvements are being made to those types, and changes are likely to be restricted to bug/security fixes. But note that this isn't any official status or anything - just the way I see it.

bjornbouetsmith commented 3 years ago

That is quite a problematic way to show that something has a performance problem in any real-world scenario; yes, method A may be negligibly slower than method B (e.g. by a few nanoseconds), but that doesn't mean it's significant. It's always possible to devise a benchmark which would amplify that difference, e.g. by simulating a 10 million-column row (something which isn't even supported by SQL Server AFAIK), but that doesn't necessarily have any real meaning to anyone in the real world.

I am working on a system that for unknown reasons have a "contract" defined that defines 200+ fields which all are populated from a stored procedure that returns those fields. Many of the fields comes from outer joins, so they can be null. Changing this pattern is possible, but a huge undertaking that requires changing a large code base with systems older than 15 years. Right now we have 10 million rows, and this will grow to perhaps 20 million rows within 2 years. These rows takes quite a long time to load and process and any time we can cut off makes a difference - so I have profiled the code, not against a production database, but against a DEV database with only 50k rows of the same size - and I could measure significant difference when I changed from GetXX methods to GetSqlXX methods - just because I did not have to call IsDbNull. So to answer your question - yes I have seen performance problems - I have already changed the code, so all other data types except DateTime has been converted to GetSqlXX - simply because it matters.

This is a respectful discussion with differing opinions - I'd avoid calling the other person's benchmark "next to useless". My benchmark tests exactly what it tests: the difference for a single query retrieving a single value; ExecuteReader and Read are the same in both methods, so they should be constant. Regardless, what you consider a real-world usage is definitely a matter for discussion, but 10 million+ wide rows definitely seems further away from reasonable than what a query fetching a single value.

I am sorry, I could have worded it differently - my point is that you include code that is an order of magnitude slower than the methods we are discussing - which is why I wrote that you might as well have added a Thread.Sleep(10000) in the test - that is also a constant, but would again drown out the time of the methods you are testing. The tests I have written tests the methods - and show a big difference between the two methods in question, that was hidden the way you tested, you even wrote yourself:

Because a quick benchmark seems to show that the perf of GetSqlDateTime is the same as IsDBNull + GetDateTime (see below).

Which could not be further from the truth.

Looking at your results, I'm seeing differences between almost 0 to around 30ns (the bytes case is an interesting outlier).

That is correct, but looking at it in absolute numbers is probably not the only relevant way - in percentage, its up to 100% faster for some of the methods and only GetSqlString is slower than GetString - which is explained by the extra work GetSqlString needs to do.

If I can shave of 30ns 10million200 - that is 60 seconds, in two years it might be worth 120 seconds etc., then its very much measurable and worth doing for my company/work. The system I work on is a low latency system, we don't use .NET tasks because they allocate memory, and use structs widely to get rid of heap allocations - I agree for most people it matters not one bit, but for those systems where you work your utmost to tweak and squeeze every bit of performance out of the system - it matters

But stepping back for a second... The question here is whether the difference between IsDBNull + GetWhatever and GetSqlWhatever is something that justifies prioritizing optimization efforts - compared to other possible performance work on SqlClient. My personal opinion is that it isn't. But even if it were the case, rather than introducing a new GetSqlDateTime2, efforts could be concentrated on simply making IsDBNull + GetWhatever faster. First, there's no reason to believe that's not possible; second, GetDateTime is the standard ADO.NET API, as opposed to GetSqlDateTime which is SqlClient-specific and so therefore not available to a database-agnostic application.

I doubt you can make any changes to the sql client to make IsDbNull+GetWhatever as fast as GetSqlWhatever - simply because you have to check for null and you have to check for metadata both in IsDbNull+GetWhatever - GetSqlWhatever can short circuit a lot of that work, since it returns a struct that know whether or not its value is null

And again, at least conceptually there's nothing IsDBNull does which wouldn't need to be done by GetSqlDateTime internally.

That is probably true, but the advantage is that GetSqlWhatever at the same time returns the value - where IsDbNull only returns an indication whether or not its safe to call GetWhatever.

The only real possible alternative that would work "similar" as I see it - Would be a "GetIfNotNull" method that simply combined what IsDbNull did. Similar to the "example" I made earliear in the thread

public bool TryGetValue<T>(int i, out T value)
{
    ReadColumn(i);
    if (_data[i].IsNull)
    {
        value = default;
        return false;
    }
    // Do whatever is needed for the specific type
}
// or

public (bool IsNull,T Value) GetValue<T>(int i)
{
    ReadColumn(i);
    if (_data[i].IsNull)
    {
        return(true,default);
    }
    // Do whatever is needed for the specific type
}
// or probably my favorite
public T? GetValue<T>(int i)
{
    ReadColumn(i);
    if (_data[i].IsNull)
    {
        return null;
    }
    // Do whatever is needed for the specific type
}

The point is that the null check needs to be in the same method that returns the data, otherwise I cannot se it will ever be as optimal as the GetSqlWhatever methods, because they handle null internally.

Wraith2 commented 3 years ago

I can see your need and use case. Unfortunately the best advice available at the moment would be to use the Sql* types where you can and handle datetime2 the other way. I think you're slightly trading off memory use for cpu use but if that works for you then it's a sensible decision to make, I may decide to do the same in your position.

The time difference you're seeing is almost entirely unavoidable based on my review of the code, it's all function call setup and state validation, none of which can be skipped. It's simply that much slower to make two calls than one when you're running that fast.

I think the likelihood of getting an SqlDateTime2 struct added to runtime is almost 0. It's a lot more complicated than I originally thought it would be last time I investigated those types and as I've said it's old code that isn't seeing any attention beyond compatibility. Runtime are unlikely to have an interest.

I think the likelihood of adding something in this library is slightly higher than in runtime that but still quite small. If and only if the owners approved of the idea it isn't going to make the top of the priority list. On the plus side apart from the type being in a different namespace (It'd end up in Microsoft.Data.SqlTypes i think) it is pay-for play and wouldn't disrupt any existing paths or perf.

If approved and it's worth the effort it might be something that would be acceptable as an external contribution. If you need help plumbing in the type once it's written with covering tests etc then I'll help out if I can. Don't do anything without knowing first if the idea is acceptable to the owners.

Now, there might be a case for T? GetNullableField<T>(int ordinal) where T : unmanaged but that's an entirely separate idea and would have compatibility concerns.

roji commented 3 years ago

I agree with a lot of what @Wraith2 wrote above.

Re adding to Microsoft.Data.SqlTypes, although those types live in the runtime, they're technically part of SqlClient/SQL Server, so I think the decisions would have to be made there. If there were a correctness issue here, i.e. that a SqlDateTime2 struct is needed to fully represent SQL Server's datetime2, that would be a stronger argument. However, I'd be against doing anything like this purely to save users an IsDBNull - regardless of whether there's an actual perf issue (because again, if there's a problem it should be addressed generally, and not in a SqlClient-specific API).

More generally, I've done yet another benchmark (code and results below) to specifically measure the overhead of calling IsDBNull; it's ~14ns for SqlClient and ~12ns for Npgsql. Granted, there may be applications out there where that overhead is significant; these would probably be applications running on the same machine as their DB (for zero-latency), continuously pumping values of LocalDB and doing almost nothing with those values (otherwise again, the cost of IsDBNull would be dominated by whatever is being done). However, I'd venture to guess that for the vast majority of SqlClient users, shaving off these 14ns would be very low on their priority list, compared to all the other possible improvements.

IsDBNull benchmark results | Method | NullValue | DatabaseType | Mean | Error | StdDev | |--------- |---------- |------------- |---------:|---------:|---------:| | **DateTime** | **False** | **SqlServer** | **14.90 ns** | **0.020 ns** | **0.017 ns** | | String | False | SqlServer | 14.92 ns | 0.019 ns | 0.015 ns | | Int64 | False | SqlServer | 14.97 ns | 0.018 ns | 0.016 ns | | Double | False | SqlServer | 14.72 ns | 0.011 ns | 0.010 ns | | Int32 | False | SqlServer | 15.01 ns | 0.040 ns | 0.037 ns | | Int16 | False | SqlServer | 14.93 ns | 0.010 ns | 0.010 ns | | Boolean | False | SqlServer | 15.84 ns | 0.019 ns | 0.017 ns | | GetGuid | False | SqlServer | 15.26 ns | 0.007 ns | 0.007 ns | | GetBytes | False | SqlServer | 15.16 ns | 0.033 ns | 0.029 ns | | **DateTime** | **False** | **PostgreSQL** | **11.70 ns** | **0.009 ns** | **0.008 ns** | | String | False | PostgreSQL | 11.72 ns | 0.022 ns | 0.017 ns | | Int64 | False | PostgreSQL | 12.21 ns | 0.016 ns | 0.014 ns | | Double | False | PostgreSQL | 11.78 ns | 0.007 ns | 0.006 ns | | Int32 | False | PostgreSQL | 11.78 ns | 0.011 ns | 0.010 ns | | Int16 | False | PostgreSQL | 11.81 ns | 0.018 ns | 0.016 ns | | Boolean | False | PostgreSQL | 11.83 ns | 0.082 ns | 0.077 ns | | GetGuid | False | PostgreSQL | 11.76 ns | 0.026 ns | 0.023 ns | | GetBytes | False | PostgreSQL | 11.71 ns | 0.016 ns | 0.013 ns | | **DateTime** | **True** | **SqlServer** | **13.85 ns** | **0.027 ns** | **0.024 ns** | | String | True | SqlServer | 14.67 ns | 0.012 ns | 0.010 ns | | Int64 | True | SqlServer | 14.78 ns | 0.026 ns | 0.023 ns | | Double | True | SqlServer | 14.73 ns | 0.025 ns | 0.024 ns | | Int32 | True | SqlServer | 14.97 ns | 0.031 ns | 0.029 ns | | Int16 | True | SqlServer | 15.10 ns | 0.084 ns | 0.094 ns | | Boolean | True | SqlServer | 14.99 ns | 0.043 ns | 0.038 ns | | GetGuid | True | SqlServer | 14.23 ns | 0.037 ns | 0.034 ns | | GetBytes | True | SqlServer | 14.10 ns | 0.015 ns | 0.013 ns | | **DateTime** | **True** | **PostgreSQL** | **11.79 ns** | **0.031 ns** | **0.029 ns** | | String | True | PostgreSQL | 11.74 ns | 0.005 ns | 0.004 ns | | Int64 | True | PostgreSQL | 11.77 ns | 0.009 ns | 0.008 ns | | Double | True | PostgreSQL | 11.73 ns | 0.014 ns | 0.013 ns | | Int32 | True | PostgreSQL | 11.79 ns | 0.007 ns | 0.007 ns | | Int16 | True | PostgreSQL | 12.19 ns | 0.018 ns | 0.017 ns | | Boolean | True | PostgreSQL | 11.78 ns | 0.005 ns | 0.005 ns | | GetGuid | True | PostgreSQL | 11.73 ns | 0.004 ns | 0.003 ns | | GetBytes | True | PostgreSQL | 11.82 ns | 0.003 ns | 0.003 ns |
IsDBNull benchmark code ```c# public class Program { const string SqlConnectionString = @"Server=localhost;Database=test;User=SA;Password=Abcd5678;Connect Timeout=60;ConnectRetryCount=0"; const string NpgsqlConnectionString = @"Server=localhost;Username=test;Password=test"; [Params(true, false)] public bool NullValue { get; set; } [Params(DatabaseType.SqlServer, DatabaseType.PostgreSQL)] public DatabaseType DatabaseType { get; set; } DbConnection _connection; DbCommand _command; DbDataReader _reader; [GlobalSetup] public void Setup() { switch (DatabaseType) { case DatabaseType.SqlServer: _connection = new SqlConnection(SqlConnectionString); _connection.Open(); _command = _connection.CreateCommand(); _command.CommandText = @" SELECT CAST(NULL AS datetime), CAST(NULL AS NVARCHAR(10)), CAST(NULL AS BIGINT), CAST(NULL as FLOAT), CAST(NULL as INT), CAST(NULL as SMALLINT), CAST(NULL AS BIT), CAST(NULL AS uniqueidentifier), CAST(NULL AS timestamp) UNION SELECT CAST('2020-01-01 01:01:01.123' AS datetime), CAST('AbCdEfGhIj' AS NVARCHAR(10)), CAST(4611686018427387903 AS BIGINT), CAST(46116860184.27387903 as FLOAT), CAST(1234567 as INT), CAST(1234 as SMALLINT), CAST(0 AS BIT), CAST(NEWID() AS uniqueidentifier), CAST(CURRENT_TIMESTAMP AS timestamp)"; break; case DatabaseType.PostgreSQL: _connection = new NpgsqlConnection(NpgsqlConnectionString); _connection.Open(); _command = _connection.CreateCommand(); _command.CommandText = @" SELECT CAST(NULL AS timestamp), CAST(NULL AS TEXT), CAST(NULL AS BIGINT), CAST(NULL as FLOAT), CAST(NULL as INT), CAST(NULL as SMALLINT), CAST(NULL AS BIT), CAST(NULL AS uuid), CAST(NULL AS bytea) UNION SELECT CAST('2020-01-01 01:01:01.123' AS timestamp), CAST('AbCdEfGhIj' AS TEXT), CAST(4611686018427387903 AS BIGINT), CAST(46116860184.27387903 as FLOAT), CAST(1234567 as INT), CAST(1234 as SMALLINT), CAST(0 AS BIT), CAST(gen_random_uuid() AS uuid), CAST('hello' AS bytea)"; break; } _reader = _command.ExecuteReader(); _reader.Read(); if (!NullValue) { _reader.Read(); } } [GlobalCleanup] public void Cleanup() { _connection.Dispose(); _command.Dispose(); _reader.Close(); } [Benchmark] public bool DateTime() => _reader.IsDBNull(0); [Benchmark] public bool String() => _reader.IsDBNull(1); [Benchmark] public bool Int64() => _reader.IsDBNull(2); [Benchmark] public bool Double() => _reader.IsDBNull(3); [Benchmark] public bool Int32() => _reader.IsDBNull(4); [Benchmark] public bool Int16() => _reader.IsDBNull(5); [Benchmark] public bool Boolean() => _reader.IsDBNull(6); [Benchmark] public bool GetGuid() => _reader.IsDBNull(7); [Benchmark] public bool GetBytes() => _reader.IsDBNull(8); static void Main(string[] args) => BenchmarkRunner.Run(); } public enum DatabaseType { SqlServer, PostgreSQL } ```
bjornbouetsmith commented 3 years ago

I can see your need and use case. Unfortunately the best advice available at the moment would be to use the Sql* types where you can and handle datetime2 the other way. I think you're slightly trading off memory use for cpu use but if that works for you then it's a sensible decision to make, I may decide to do the same in your position.

Which is what I have done, but I would prefer to also handle Datetime this way - and since the API is not complete, since its missing DATETIME2 - and possibly other types I made this issue.

The time difference you're seeing is almost entirely unavoidable based on my review of the code, it's all function call setup and state validation, none of which can be skipped. It's simply that much slower to make two calls than one when you're running that fast.

Indeed, which is also more or less what I wrote - that you probably cannot make two methods ever perform as fast as GetSqlWhatever.

I think the likelihood of getting an SqlDateTime2 struct added to runtime is almost 0. It's a lot more complicated than I originally thought it would be last time I investigated those types and as I've said it's old code that isn't seeing any attention beyond compatibility. Runtime are unlikely to have an interest.

So if I focus my attention to getting a SqlDateTime2 into runtime, and if I succeed - it would make sense in my opinion to also implement a GetSqlDateTime2 method in this library?

I think the likelihood of adding something in this library is slightly higher than in runtime that but still quite small. If and only if the owners approved of the idea it isn't going to make the top of the priority list.

"owners"? Who are you guys commenting - just random dudes on the internet? Who do I need to convince that it is needed?

If approved and it's worth the effort it might be something that would be acceptable as an external contribution. If you need help plumbing in the type once it's written with covering tests etc then I'll help out if I can. Don't do anything without knowing first if the idea is acceptable to the owners.

Thank you, but hopefully I should be able to manage on my own, unless there are some requirements that I cannot see. But if something is accepted, then I will contact you if needed

Now, there might be a case for T? GetNullableField<T>(int ordinal) where T : unmanaged but that's an entirely separate idea and would have compatibility concerns.

Yes - that would solve all issues (if you also added a version that handled strings/byte arrays etc. i.e. objects :-) ) What compatiblity concerns would that be? Unless you are taking about adding it to the IDataReader interface and not just to SqlDataReader?

Surely SqlDataReader is not compatible with .NET 1 which support ended in 2009 and nullable came with .NET 2 as far as I remember.

bjornbouetsmith commented 3 years ago

Re adding to Microsoft.Data.SqlTypes, although those types live in the runtime, they're technically part of SqlClient/SQL Server, so I think the decisions would have to be made there. If there were a correctness issue here, i.e. that a SqlDateTime2 struct is needed to fully represent SQL Server's datetime2, that would be a stronger argument. However, I'd be against doing anything like this purely to save users an IsDBNull - regardless of whether there's an actual perf issue (because again, if there's a problem it should be addressed generally, and not in a SqlClient-specific API).

More generally, I've done yet another benchmark (code and results below) to specifically measure the overhead of calling IsDBNull; it's ~14ns for SqlClient and ~12ns for Npgsql. Granted, there may be applications out there where that overhead is significant; these would probably be applications running on the same machine as their DB (for zero-latency), continuously pumping values of LocalDB and doing almost nothing with those values (otherwise again, the cost of IsDBNull would be dominated by whatever is being done). However, I'd venture to guess that for the vast majority of SqlClient users, shaving off these 14ns would be very low on their priority list, compared to all the other possible improvements.

So if calling IsDbNull takes up ~12-14ns - then it means the GetWhatEver methods on average is slower than the GetSqlWhatever methods - since my test clearly shows that the GetSqlWhatever was mostly faster - and because of how "NULL" is handled in this library, you are forced to call IsDbNull - unless you are 100% sure that NULL cannot be returned from the database.

In terms of the actual time taken - you are forgetting one crucial part of a system - the time it takes to start, i.e. the time it takes from you click "start" in ServiceManager do your systemctl start whatever to its fully operational. This is where shaving off time can be crucial - if you are having a production issue and you need to spin up a replacement server and you can shave 1 minute of startup time, simply because of using a different method in sql client - it is definately worth it - not just for a tiny fraction of users. I agree - you have to have a lot of rows with a lot of columns that are potentially NULL before it makes any sense - and a lot of people are using SqlServer with almost no data present "just because" - where a simple xml file would be sufficed. But SQL Server is also meant to be an enterprise solution, capable of handling million rows as efficient as possible and so should its corresponding .NET helper code.

So - who do I need to convince that a change is needed if its no you two?

bjornbouetsmith commented 3 years ago

And it seems like SqlTypes are not dead - and are still being optimised upon, and thus cannot be considered "legacy" or "archive" - because why would you fix old stuff if you want people to use other stuff. https://github.com/dotnet/runtime/issues/1034

Wraith2 commented 3 years ago

Surely SqlDataReader is not compatible with .NET 1 which support ended in 2009 and nullable came with .NET 2 as far as I remember.

The unmanaged constraint is stonger than struct and was introduced with c#7.3 so it has runtime version requirements. This library multi targets with core2.1 and fx4.7.2 being the lowest versions and if those didn't support the feature it would be problematic.

Optimization is simply improving perf on the type, it doesn't change behaviour or provide new capabilities. It's also an external contribution so as long as it meets the quality and test bar it'll be taken into runtime. Lets see where your runtime api request goes,

bjornbouetsmith commented 3 years ago

The unmanaged constraint is stonger than struct and was introduced with c#7.3 so it has runtime version requirements. This library multi targets with core2.1 and fx4.7.2 being the lowest versions and if those didn't support the feature it would be problematic.

Ok, but I guess it do not have to be unmanaged - since we don't need support for IntPtr etc - only primitives and the few structs needed, i.e. DateTime/DateTimeOffset & TimeSpan etc. - which is already handled by struct

Methods did not even have to be generic, although that would require a full range of methods instead of just one - but then no type checking would have to be done inside the method, in case some person decided to try with their own homemade structs.

i.e. it would require a full range of GetNullableWhatever to match the GetWhatever methods.

I hope the powers in charge can see the reason for adding a SqlDateTime2 struct - otherwise lifting accessibility of the required internal structs/methods could help people implement their own version of it, i.e. if SqlBuffer etc. was not internal then I could write my own SqlClient extensions that would do what I want.

JRahnama commented 3 years ago

@bjornbouetsmith I'd recommend that you make the PR, then we can review it with other team members and with the strong support from @roji and @Wraith2 we can decide if we are going to proceed or not.

bjornbouetsmith commented 3 years ago

@JRahnama What PR is that - and for what - several solutions has been discussed? Do you mean for a SqlDateTime2 & equivelant GetSqlDateTime2? Or the methods that return a nullable? Or something entirely different?

David-Engel commented 3 years ago

@bjornbouetsmith First off, thanks for submitting the idea and showing where it has performance benefits. @roji and @Wraith2 are valued contributors here and I appreciate their analysis.

We all have jobs and existing priorities so it's really up to the requestor to make the case for a change and get others to agree that the change should be prioritized over other people's needs. Your needs are obviously very performance oriented. Implementing the change yourself would be the quickest way forward. You just need to make sure your changes would be accepted, as @JRahnama, said.

It's really up to you to consolidate input and suggestions into a coordinated feature request. As it is, this thread is getting quite long. I suggest adding a proposal summary in your original comment for this issue based on discussions. I'm not sure what the technical proposal is, either. (See CONTRIBUTING for more tips.)

My opinion: I think it would be difficult to push a new type into System.Data.SqlTypes. If you were successful, it would only go into future versions of .NET. Then you would still have to get SqlClient changed to use it. And that would require SqlClient to add a binary specific to that future target. If there is a solution that is limited to SqlClient that you yourself can implement and that fits into the project's goals of stability, reliability, backwards compatibility, maintainability, and performance, then that sets you up for a better chance of success.

Regards, David

Wraith2 commented 3 years ago

@bjornbouetsmith ^ that's official guidance. 😁

So, how about implementing a Microsoft.Data.SqlTypes.SqlDateTime2 struct in this library with test coverage and adding a new method SqlDataTime2 SqlDataReader.GetSqlDatetime2(int ordinal). I don't think you need all the operators apart from possibly conversions to DateTime and SqlDateTime. Since we can't change the System.Date.Common types there's no point having all the other conversions and operations on them at the moment.

Does that seems like a start of a plan?

bjornbouetsmith commented 3 years ago

@bjornbouetsmith ^ that's official guidance. 😁

Perfect

So, how about implementing a Microsoft.Data.SqlTypes.SqlDateTime2 struct in this library with test coverage and adding a new method SqlDataTime2 SqlDataReader.GetSqlDatetime2(int ordinal).

I will get right on that - I assume I just fork and then make a PR when I have something I think is finished and of sufficient quality? It has been a while since I have contributed to code via GitHub - last time was with NLog where I "funnily" enough also worked with performance and memory.

Does that seems like a start of a plan?

It does indeed.

ErikEJ commented 3 years ago

@bjornbouetsmith you can also have a look at the contributor guide...

bjornbouetsmith commented 3 years ago

Just a quick question - why does the source code have two implementations of everything? One for netcore/netstandard and one for netframework? - I can see some of the implementations are slightly different, but most are exactly the same.

For simplicity's sake?

ErikEJ commented 3 years ago

@bjornbouetsmith No - for complexity's sake 😉

Actually, for historical reasons, the .NET Core implementation was branched out some years ago (as System.Data.SqlClient), and was not kept in sync with the .NET Framework implementation, and then copied into this project (Microsoft.Data.SqlClient). Work is ongoing to make more code common, the shared files are located here: https://github.com/dotnet/SqlClient/tree/master/src/Microsoft.Data.SqlClient/src/Microsoft/Data

Wraith2 commented 3 years ago

I'm slowly working through merging the two codebases wherever I can and eventually we'll get to the point where the MS team will be needed to make policy decisions about each of the differences. Until then i'd prefer if you add any new code to the shared folder and add it identically in both projects.

Start with just one build and then replicate the changes into the other. This is one of the reasons I offered to help because I'm familiar with the project peculiarities. If you can get the SqlDataTime2 with test for all of it's methods written I can help get it plumbed in.

ErikEJ commented 3 years ago

It is great that this project is available as open source, but the weight of 20 years of .NET code is present.

bjornbouetsmith commented 3 years ago

So I can dump my code into https://github.com/dotnet/SqlClient/tree/master/src/Microsoft.Data.SqlClient/src - and add as links into the two projects?

Of course the changes I have to make into existing classes need to go to both places.

ErikEJ commented 3 years ago

I think what @Wraith2 is suggesting is that you add the files there, and for example just link from the .NET Framework project, and make tests pass for that, then he will add to .NET Core and add docs links and other quirky stuff.

bjornbouetsmith commented 3 years ago

Ok, and how do you feel about me adding [assembly:InternalsVisibleTo] so I can test SqlBuffer separately with a unit test?

Wraith2 commented 3 years ago

Not favourably. We've avoided it in the past by including the file directly in the test assembly and writing tests that way. You'll want to add the type tests to the FunctionalTests project because they don't require an active sql connection, anything that does require a database would go in the ManualTests. I've done a shared file to access internal classes in this PR https://github.com/dotnet/SqlClient/pull/827

bjornbouetsmith commented 3 years ago

@Wraith2 Makes it hard to test my new Property then - since there are two SqlBuffer versions - and they are not the same implementation - so I would need to do conditional includes based on the framework built - that is ugly. But if that is what is needed so I can write a couple of tests for the DateTime2 property, then so be it - but it would be nice if the two versions of SqlBuffer could be merged into one.

Edit: Just tried - and it escalates, then I also have to link SqlCachedBuffer,SqlCahedStream,SqlMetaDataPriv,SqlReource and so forth. It just explodes - and eventually I would probably have to include the entire SqlClient into the test project. So for now I will not include any tests for SqlBuffer - which is unfortunate but it will implicitly be tested via SqlDataReader so not a huge problem in my opinion - but it would have been easier with [assembly:InternalsVisibleTo]

Wraith2 commented 3 years ago

You don't need to test SqlBuffer in the functional tests because it'll be covered in the manual tests through doing real queries. The new type should be the only one you need to add tests for and that file will be shared.

bjornbouetsmith commented 3 years ago

Pull request created for your initial comments. https://github.com/dotnet/SqlClient/pull/853