SOCI / soci

Official repository of the SOCI - The C++ Database Access Library
http://soci.sourceforge.net/
Boost Software License 1.0
1.37k stars 472 forks source link

Support for Wide Strings in SOCI for Enhanced Unicode Handling #1133

Open bold84 opened 3 months ago

bold84 commented 3 months ago

This pull request adds comprehensive support for wide strings (wchar_t, std::wstring) to the SOCI database library, significantly improving its support for Unicode string types such as SQL Server's NVARCHAR and NTEXT. This enhancement is crucial for applications that require robust handling of international and multi-language data.

Key Changes:

  1. Introduced exchange_type_traits and exchange_traits Specializations:

    • These specializations facilitate the handling of wide strings during type exchange, ensuring proper conversion and management within the SOCI library.
  2. Updated ODBC Backend:

    • Added support for wide strings, specifically for wchar_t and std::wstring.
    • Adjusted the parameter binding and data retrieval mechanisms to correctly process wide characters.
  3. Enhanced Buffer Management:

    • Modified buffer allocation and management to accommodate the larger size of wide characters, which are essential for proper Unicode support.
    • Implemented logic to handle buffer size overflow, ensuring safety and stability when processing large text data.
  4. Improved Unicode Support:

    • Incorporated routines to convert between different Unicode encodings (UTF-16 and UTF-32 on Unix-like systems, native UTF-16 on Windows) to handle wide strings properly across various platforms.
  5. Extended Test Coverage:

    • Added comprehensive tests focusing on wide string handling, especially ensuring compatibility with SQL Server.
    • Included edge cases for large strings to test buffer management and overflow handling.

Notes:

This update significantly bolsters SOCI's capabilities in handling Unicode data, making it a more versatile and powerful tool for database interactions in multi-language applications.

Example usage

Here are a few examples showing how the new wide string features can be used with the ODBC backend.

Example 1: Handling std::wstring in SQL Queries

Inserting and Selecting std::wstring Data

#include <soci.h>
#include <soci-odbc.h>
#include <iostream>

int main()
{
    try
    {
        soci::session sql(soci::odbc, "DSN=my_datasource;UID=user;PWD=password");

        // Create table with NVARCHAR column
        sql << "CREATE TABLE soci_test (id INT IDENTITY PRIMARY KEY, wide_text NVARCHAR(40) NULL)";

        // Define a wstring to insert
        std::wstring wide_str_in = L"Hello, 世界!";

        // Insert the wstring
        sql << "INSERT INTO soci_test(wide_text) VALUES (:wide_text)", soci::use(wide_str_in);

        // Retrieve the wstring
        std::wstring wide_str_out;
        sql << "SELECT wide_text FROM soci_test WHERE id = 1", soci::into(wide_str_out);

        // Output the retrieved wstring
        std::wcout << L"Retrieved wide string: " << wide_str_out << std::endl;
    }
    catch (const soci::soci_error& e)
    {
        std::cerr << "Error: " << e.what() << std::endl;
    }

    return 0;
}

Example 2: Working with wchar_t Vectors

Inserting and Selecting Wide Characters

#include <soci.h>
#include <soci-odbc.h>
#include <iostream>
#include <vector>

int main()
{
    try
    {
        soci::session sql(soci::odbc, "DSN=my_datasource;UID=user;PWD=password");

        // Create table with NCHAR column
        sql << "CREATE TABLE soci_test (id INT IDENTITY PRIMARY KEY, wide_char NCHAR(2) NULL)";

        // Define a vector of wide characters to insert
        std::vector<wchar_t> wide_chars_in = {L'A', L'B', L'C', L'D'};

        // Insert the wide characters
        sql << "INSERT INTO soci_test(wide_char) VALUES (:wide_char)", soci::use(wide_chars_in);

        // Retrieve the wide characters
        std::vector<wchar_t> wide_chars_out(4);
        sql << "SELECT wide_char FROM soci_test WHERE id IN (1, 2, 3, 4)", soci::into(wide_chars_out);

        // Output the retrieved wide characters
        for (wchar_t ch : wide_chars_out)
        {
            std::wcout << L"Retrieved wide char: " << ch << std::endl;
        }
    }
    catch (const soci::soci_error& e)
    {
        std::cerr << "Error: " << e.what() << std::endl;
    }

    return 0;
}

Example 3: Using std::wstring with the sql Stream Operator

Inserting and Selecting std::wstring Data with Stream Operator

#include <soci.h>
#include <soci-odbc.h>
#include <iostream>

int main()
{
    try
    {
        soci::session sql(soci::odbc, "DSN=my_datasource;UID=user;PWD=password");

        // Create table with NVARCHAR column
        sql << "CREATE TABLE soci_test (id INT IDENTITY PRIMARY KEY, wide_text NVARCHAR(40) NULL)";

        // Define a wstring to insert
        std::wstring wide_str_in = L"Hello, 世界!";

        // Use stream operator to insert the wstring
        sql << "INSERT INTO soci_test(wide_text) VALUES (N'" << wide_str_in << "')";

        // Retrieve the wstring using stream operator
        std::wstring wide_str_out;
        sql << "SELECT wide_text FROM soci_test WHERE id = 1", soci::into(wide_str_out);

        // Output the retrieved wstring
        std::wcout << L"Retrieved wide string: " << wide_str_out << std::endl;
    }
    catch (const soci::soci_error& e)
    {
        std::cerr << "Error: " << e.what() << std::endl;
    }

    return 0;
}

In this example:

  1. A soci::session object is created to connect to the database.
  2. A table is created with an NVARCHAR column.
  3. A std::wstring is defined for insertion.
  4. The sql stream operator is used to insert the std::wstring into the database. Note the use of N' to indicate a Unicode string in SQL Server.
  5. The std::wstring is retrieved from the database using the sql stream operator and the soci::into function.
  6. The retrieved wide string is printed to the console using std::wcout.

These examples demonstrate how to insert and retrieve wide strings and wide characters using SOCI's newly added features for handling wide strings (wchar_t, std::wstring).

Disclaimer: This text is AI generated.

bold84 commented 3 months ago

Converting from UTF-16 to UTF-8 is no problem when retrieving data, because the column data type is known. When inserting/updating though, it is not so straightforward, as we don't have programmatic knowledge of the column data type in advance.

I'm thinking of adding another argument to "soci::use()" that lets the developer override the data type that's used for the underlying ODBC call.

Another issue is the currently non-existing N'' enclosure for unicode strings for MSSQL in case of soci::use().

Another issue is the stream interface. Currently std::wstring isn't supported and as far as I understand, supporting it would require widening the query to UTF-16 before sending it to the DB.

bold84 commented 2 weeks ago

Please note that I updated the FreeBSD Image for Cirrus from 13.2 to 13.3.

https://github.com/cirruslabs/cirrus-ci-docs/pull/1277

bold84 commented 2 weeks ago

I'm adding better UTF conversion first.