dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.96k stars 4.65k forks source link

Infinite loop on OdbcDataReader.internalGetString() for buggy ODBC drivers #86490

Open shessane opened 1 year ago

shessane commented 1 year ago

Description

To read data from an ODBC datasource a client app uses the class OdbcDataReader. The method to read a string value is GetString. This method calls the internalGetString. This will read a string value by chunks of 4kB by calling the SQLGetData from an underlying driver as many as it needed to complete the read. The first call to SQLGetData returns the total string length, and all the other calls returns the length of the actual chunk.

Current implementation of internalGetString assumes that the total string length will be equal to the sum of all chunk string length that the underlying driver will return (and the first chunk is 4094). This assumption is used in a while loop to decrement a variable called cbMissing with the chunk length returned by GetData, until cbMissing is equal to 0, or the GetData returns false.

But if a buggy driver don't respect this assumption, an infinite read loop is caused. My case using the IBM DB2 ODBC DRIVER, and reading an UTF8 string with a length greater than 4kB, and containing a mix of 1-byte and 2-bytes characters in all chunks (like é or â).

Reproduction Steps

Create a testjson proc stock that read a 5kB Utf8 string returned by some DB2 function (like json_object). The resulted string should contain a mix of 1-byte and 2-bytes characters in all chunks (like é or â).

Then we will read the result using this simple C# program

        public async Task TestJsonOdbc()
        {
            const string connectionString = "dsn=...;driver={IBM DB2 ODBC DRIVER};uid=..;pwd=..";
            await using var connection = new OdbcConnection(connectionString);

            await connection.OpenAsync();

            var command = connection.CreateCommand();
            command.CommandText = "call testjson()";
            command.CommandType = CommandType.StoredProcedure;

            await using var reader = await command.ExecuteReaderAsync();

            while (await reader.ReadAsync())
            {
                var str = reader.GetString(0); // This call never ends
            }
        }

The ODBC trace produced by the GetString :

powershell      54b8-237c   ENTER SQLGetData 
        HSTMT               0x0000028E94CC1610
        UWORD                        1 
        SWORD                       -8 <SQL_C_WCHAR>
        PTR                 0x0000028EAD2400F0 
        SQLLEN                  4094
        SQLLEN *            0x000000D3E5F4D950

powershell      54b8-237c   EXIT  SQLGetData  with return code 1 (SQL_SUCCESS_WITH_INFO)
        HSTMT               0x0000028E94CC1610
        UWORD                        1 
        SWORD                       -8 <SQL_C_WCHAR>
        PTR                 0x0000028EAD2400F0 [    4094] "..."
        SQLLEN                  4094
        SQLLEN *            0x000000D3E5F4D950 (4608)

        DIAG [01004] [IBM][System i Access ODBC Driver]Column 1: Data truncated. (30196) 

powershell      54b8-237c   ENTER SQLGetData 
        HSTMT               0x0000028E94CC1610
        UWORD                        1 
        SWORD                       -8 <SQL_C_WCHAR>
        PTR                 0x0000028EAD2400F0 
        SQLLEN                  4094
        SQLLEN *            0x000000D3E5F4D950

powershell      54b8-237c   EXIT  SQLGetData  with return code 0 (SQL_SUCCESS)
        HSTMT               0x0000028E94CC1610
        UWORD                        1 
        SWORD                       -8 <SQL_C_WCHAR>
        PTR                 0x0000028EAD2400F0 [     502] "..."
        SQLLEN                  4094
        SQLLEN *            0x000000D3E5F4D950 (502)

powershell      54b8-237c   ENTER SQLGetData 
        HSTMT               0x0000028E94CC1610
        UWORD                        1 
        SWORD                       -8 <SQL_C_WCHAR>
        PTR                 0x0000028EAD2400F0 
        SQLLEN                  4094
        SQLLEN *            0x000000D3E5F4D950

powershell      54b8-237c   EXIT  SQLGetData  with return code 100 (SQL_NO_DATA_FOUND)
        HSTMT               0x0000028E94CC1610
        UWORD                        1 
        SWORD                       -8 <SQL_C_WCHAR>
        PTR                 0x0000028EAD2400F0 
        SQLLEN                  4094
        SQLLEN *            0x000000D3E5F4D950

powershell      54b8-237c   ENTER SQLGetData 
        HSTMT               0x0000028E94CC1610
        UWORD                        1 
        SWORD                       -8 <SQL_C_WCHAR>
        PTR                 0x0000028EAD2400F0 
        SQLLEN                  4094
        SQLLEN *            0x000000D3E5F4D950

powershell      54b8-237c   EXIT  SQLGetData  with return code 100 (SQL_NO_DATA_FOUND)
        HSTMT               0x0000028E94CC1610
        UWORD                        1 
        SWORD                       -8 <SQL_C_WCHAR>
        PTR                 0x0000028EAD2400F0 
        SQLLEN                  4094
        SQLLEN *            0x000000D3E5F4D950

powershell      54b8-237c   ENTER SQLGetData 
        HSTMT               0x0000028E94CC1610
        UWORD                        1 
        SWORD                       -8 <SQL_C_WCHAR>
        PTR                 0x0000028EAD2400F0 
        SQLLEN                  4094
        SQLLEN *            0x000000D3E5F4D950

... (infinite SQLGetData  with return code 100 (SQL_NO_DATA_FOUND))

The Utf8 string that we try to read contains 2277 chars, with a length of 2304. The first SQLGetData returns a total size of 4608 (2304 2). The driver bug is here as it over estimate the size (it should returns only 2277 2=4554). So after the first SQLGetData, the driver fills the 4094 buffer with the first chunk (actually 4092 + 2 for the end char). The second call to SQLGetData fill the buffer with the remaining chunk, and returns 502 (the remaining chunk contains 251 chars with an Utf8 length of 258). So the cbMissing=4608-4092-502=14. The cbMissing is never 0, and all the next SQLGetData returns SQL_NO_DATA_FOUND with 0 that is considered a success. Hence the infinite loop.

For now I tested only the IBM DB2 ODBC Driver, reading from a DB2 server. I'll try to reproduce with SQL server or SQlite later.

Expected behavior

OdbcReader should not make assumptions based on the string length, but should read all the buffer returned by the driver until the end. Then using the StringBuilder it will convert the buffer to a C# string. This will make the OdbcReader solid to a bug in the length returned by the driver.

Actual behavior

Infinite read loop, causing high CPU usage by the read thread. As the infinite loop is on the reader side, this cause the Odbc query never ends. Query timeouts are thrown on a timeout communication between the driver and the server, and don't include the odbc adaptation. The SQLGetData keeps returning SQL_NO_DATA_FOUND, and the reader reads 0 byte chunk in an infinite loop.

In the case of an API that serves multiple request, each request causes a start of a thread with a while(true) that never ends. This causes CPU rise to 100%, and the only way to fix it is to restart the process.

Regression?

No, the same bug is present in .Net Framework 4.7, netcore 3.1 and net 6

Known Workarounds

Instead of reading a string we read bytes, OdbcDataReader.GetBytes(), and construct a c# string manually. The GetBytes uses internally the same GetData, but make no assumption based on remaining length calculated by the underlying driver. Another workaround in the case of IBM driver is to force the server to return Ebcdic value instead of UTF8. But this is a server side workaround that is out of the scope of Odbc.

Configuration

Tested on Windows 10 and Windows 2019 Server with netcore 3.1 and net 6 app. Confirmed with powershell script that uses NetFramework 4.7

I think it's global to any configuration.

Other information

A bug to IBM is reported to fix the driver SQLGetData returns. The current issue is just about the infinite loop. If you are open to contribution I can propose you a fix.

[Edit] Add explanation about the Odbc trace produced.

ghost commented 1 year ago

Tagging subscribers to this area: @roji, @ajcvickers See info in area-owners.md if you want to be subscribed.

Issue Details
### Description To read data from an ODBC datasource a client app uses the class OdbcDataReader. The method to read a string value is GetString. This method calls the [internalGetString](https://github.com/dotnet/runtime/blob/f346aac205644f237a43092e3f2952d143b521ad/src/libraries/System.Data.Odbc/src/System/Data/Odbc/OdbcDataReader.cs#L936). This will read a string value by chunks of 4kB by calling the SQLGetData from an underlying driver as many as it needed to complete the read. The first call to SQLGetData returns the total string length, and all the other calls returns the length of the actual chunk. Current implementation of internalGetString assumes that the total string length will be equal to the sum of all chunk string length that the underlying driver will return (and the first chunk is 4094). This assumption is used in a while loop to decrement a variable called [cbMissing](https://github.com/dotnet/runtime/blob/f346aac205644f237a43092e3f2952d143b521ad/src/libraries/System.Data.Odbc/src/System/Data/Odbc/OdbcDataReader.cs#L1016) with the chunk length returned by GetData, until cbMissing is equal to 0, or the GetData returns false. But if a buggy driver don't respect this assumption, an infinite read loop is caused. My case using the IBM DB2 ODBC DRIVER, and reading an UTF8 string with a length greater than 4kB, and containing a mix of 1-byte and 2-bytes characters in all chunks (like é or â). ### Reproduction Steps Create a testjson proc stock that read a 5kB Utf8 string returned by some DB2 function (like [json_object](https://www.ibm.com/docs/en/db2/11.5?topic=functions-json-object)). The resulted string should contain a mix of 1-byte and 2-bytes characters in all chunks (like é or â). Then we will read the result using this simple C# program ``` csharp public async Task TestJsonOdbc() { const string connectionString = "dsn=...;driver={IBM DB2 ODBC DRIVER};uid=..;pwd=.."; await using var connection = new OdbcConnection(connectionString); await connection.OpenAsync(); var command = connection.CreateCommand(); command.CommandText = "call testjson()"; command.CommandType = CommandType.StoredProcedure; await using var reader = await command.ExecuteReaderAsync(); while (await reader.ReadAsync()) { var str = reader.GetString(0); // This call never ends } } ``` The ODBC trace produced by the GetString : ``` powershell 54b8-237c ENTER SQLGetData HSTMT 0x0000028E94CC1610 UWORD 1 SWORD -8 PTR 0x0000028EAD2400F0 SQLLEN 4094 SQLLEN * 0x000000D3E5F4D950 powershell 54b8-237c EXIT SQLGetData with return code 1 (SQL_SUCCESS_WITH_INFO) HSTMT 0x0000028E94CC1610 UWORD 1 SWORD -8 PTR 0x0000028EAD2400F0 [ 4094] "..." SQLLEN 4094 SQLLEN * 0x000000D3E5F4D950 (4608) DIAG [01004] [IBM][System i Access ODBC Driver]Column 1: Data truncated. (30196) powershell 54b8-237c ENTER SQLGetData HSTMT 0x0000028E94CC1610 UWORD 1 SWORD -8 PTR 0x0000028EAD2400F0 SQLLEN 4094 SQLLEN * 0x000000D3E5F4D950 powershell 54b8-237c EXIT SQLGetData with return code 0 (SQL_SUCCESS) HSTMT 0x0000028E94CC1610 UWORD 1 SWORD -8 PTR 0x0000028EAD2400F0 [ 502] "..." SQLLEN 4094 SQLLEN * 0x000000D3E5F4D950 (502) powershell 54b8-237c ENTER SQLGetData HSTMT 0x0000028E94CC1610 UWORD 1 SWORD -8 PTR 0x0000028EAD2400F0 SQLLEN 4094 SQLLEN * 0x000000D3E5F4D950 powershell 54b8-237c EXIT SQLGetData with return code 100 (SQL_NO_DATA_FOUND) HSTMT 0x0000028E94CC1610 UWORD 1 SWORD -8 PTR 0x0000028EAD2400F0 SQLLEN 4094 SQLLEN * 0x000000D3E5F4D950 powershell 54b8-237c ENTER SQLGetData HSTMT 0x0000028E94CC1610 UWORD 1 SWORD -8 PTR 0x0000028EAD2400F0 SQLLEN 4094 SQLLEN * 0x000000D3E5F4D950 powershell 54b8-237c EXIT SQLGetData with return code 100 (SQL_NO_DATA_FOUND) HSTMT 0x0000028E94CC1610 UWORD 1 SWORD -8 PTR 0x0000028EAD2400F0 SQLLEN 4094 SQLLEN * 0x000000D3E5F4D950 powershell 54b8-237c ENTER SQLGetData HSTMT 0x0000028E94CC1610 UWORD 1 SWORD -8 PTR 0x0000028EAD2400F0 SQLLEN 4094 SQLLEN * 0x000000D3E5F4D950 ... (infinite SQLGetData with return code 100 (SQL_NO_DATA_FOUND)) ``` The Utf8 string that we try to read contains 2277 chars, with a length of 2304. The first SQLGetData returns a total size of 4608 (2304*2). It fill the 4094 buffer with the first chunk (actually 4092 + 2 for the end char). But the second call to SQLGetData fill the buffer with the remaining chunk, but returns only 502 (the remaining chunk contains 251 chars with an Utf8 length of 258). So the cbMissing=4608-4092-502=14. The cbMissing is never 0, and all the next SQLGetData returns SQL_NO_DATA_FOUND with 0 [that is considered a success](https://github.com/dotnet/runtime/blob/f346aac205644f237a43092e3f2952d143b521ad/src/libraries/System.Data.Odbc/src/System/Data/Odbc/OdbcDataReader.cs#L1756C17-L1763). Hence the infinite loop. For now I tested only the IBM DB2 ODBC Driver, reading from a DB2 server. I'll try to reproduce with SQL server or SQlite later. ### Expected behavior OdbcReader should not make assumptions based on the string length, but should read all the buffer returned by the driver until the end. Then using the StringBuilder it will convert the buffer to a C# string. This will make the OdbcReader solid to a bug in the length returned by the driver. ### Actual behavior Infinite read loop, causing high CPU usage by the read thread. As the infinite loop is on the reader side, this cause the Odbc query never ends. Query timeouts are thrown on a timeout communication between the driver and the server, and don't include the odbc adaptation. The SQLGetData keeps returning SQL_NO_DATA_FOUND, and the reader reads 0 byte chunk in an infinite loop. In the case of an API that serves multiple request, each request causes a start of a thread with a while(true) that never ends. This causes CPU rise to 100%, and the only way to fix it is to restart the process. ### Regression? No, the same bug is present in .Net Framework 4.7, netcore 3.1 and net 6 ### Known Workarounds Instead of reading a string we read bytes, OdbcDataReader.GetBytes(), and construct a c# string manually. The GetBytes uses internally the same GetData, but make no assumption based on remaining length calculated by the underlying driver. Another workaround in the case of IBM driver is to force the server to return Ebcdic value instead of UTF8. But this is a server side workaround that is out of the scope of Odbc. ### Configuration Tested on Windows 10 and Windows 2019 Server with netcore 3.1 and net 6 app. Confirmed with powershell script that uses NetFramework 4.7 I think it's global to any configuration. ### Other information A bug to IBM is reported to fix the driver SQLGetData returns. The current issue is just about the infinite loop. If you are open to contribution I can propose you a fix. [Edit] Add explanation about the Odbc trace produced.
Author: shessane
Assignees: -
Labels: `untriaged`, `area-System.Data.Odbc`
Milestone: -
roji commented 2 weeks ago

@MohamedChaawa any interest in picking up your PR here again (#102885)?