danielaparker / jsoncons

A C++, header-only library for constructing JSON and JSON-like data formats, with JSON Pointer, JSON Patch, JSON Schema, JSONPath, JMESPath, CSV, MessagePack, CBOR, BSON, UBJSON
https://danielaparker.github.io/jsoncons
Other
726 stars 164 forks source link

[csv] Exponential formatted numbers with leading zeros in exponent #365

Closed jsutes closed 2 years ago

jsutes commented 2 years ago

Describe the bug

"1.0e-3" gets decoded as a number automatically, but "1.0e-03" gets decoded as a string. Having a leading zero in the exponent is a very common way to write exponential formatted numbers (usually two digits are used to represent the exponent, even if only one digit is needed).

A value like "1.0e-03" should get decoded as a number automatically.

If I explicitly set options.column_types("string,string,float"); then the 1.0e-03 will be read as a number, but that only works if you know how many columns you are expecting. In general, I'd like to be able to make a call like this:

auto data = jsoncons::csv::decode_csv<std::vector<std::vector<double>>>(text, options);

to read any sized matrix of numbers. Currently, this throws an exception if any exponents have a leading zero.

Steps to reproduce the bug

Run the code below and observe the pretty print results of the last two rows. The last row's "rate" value is stored as a string.

Include a small, self-contained example if possible

#include <iomanip>
#include <iostream>
#include <jsoncons/json.hpp>
#include <jsoncons_ext/csv/csv.hpp>

const std::string data = R"(index_id,observation_date,rate
EUR_LIBOR_06M,2015-10-23,0.0000214
EUR_LIBOR_06M,2015-10-26,0.0000143
EUR_LIBOR_06M,2015-10-27,0.0000001
EUR_LIBOR_06M,2015-10-27,1.0e-3
EUR_LIBOR_06M,2015-10-27,1.0e-03
)";

using namespace jsoncons;

int main() {

    csv::csv_options options;
    options.assume_header(true);
    options.column_types("string,string,float");

    // Parse the CSV data into an ojson value
    ojson j = csv::decode_csv<ojson>(data, options);

    // Pretty print
    json_options print_options;
    print_options.float_format(float_chars_format::fixed);
    std::cout << "(1)\n" << pretty_print(j, print_options) << "\n\n";

    // Iterate over the rows
    std::cout << "(2)\n";
    for (const auto& row : j.array_range())
    {
        // Access rated as string and rating as double
        std::cout << row["index_id"].as<std::string>() << ", "
            << row["observation_date"].as<std::string>() << ", "
            << row["rate"].as<double>() << "\n";
    }

    return 0;
}

Output:

(1)
[
    {
        "index_id": "EUR_LIBOR_06M",
        "observation_date": "2015-10-23",
        "rate": 0.0000214
    },
    {
        "index_id": "EUR_LIBOR_06M",
        "observation_date": "2015-10-27",
        "rate": 0.001
    },
    {
        "index_id": "EUR_LIBOR_06M",
        "observation_date": "2015-10-27",
        "rate": "1.0e-03"
    }
]

(2)
EUR_LIBOR_06M, 2015-10-23, 2.14e-05
EUR_LIBOR_06M, 2015-10-27, 0.001
EUR_LIBOR_06M, 2015-10-27, 0.001

What compiler, architecture, and operating system?

What jsoncons library version? Latest on vcpkg: 0.168.3#1

danielaparker commented 2 years ago

Thanks for reporting this issue with CSV file parsing. It should be fixed on master. We'll have a new patch release shortly, and will include it on our next update to vcpkg.