Closed hdorio closed 6 months ago
Thanks for reporting the issue! Yes, that sounds reasonable to me. Would you like to work on it?
Thanks for reporting the issue! Yes, that sounds reasonable to me. Would you like to work on it?
Thanks for the offer! However, since I'm not familiar with C++, I think it would be best if someone with that expertise takes care of it.
Let me add an option for the column printer to print decimals as strings.
May I ask why we don't use orc-tools
(Java tool) instead?
$ orc-tools version
ORC 2.0.0
$ orc-tools --help
ORC Java Tools
usage: java -jar orc-tools-*.jar [--help] [--define X=Y] <command> <args>
Commands:
convert - convert CSV and JSON files to ORC
count - recursively find *.orc and print the number of rows
data - print the data from the ORC file
json-schema - scan JSON files to determine their schema
key - print information about the keys
meta - print the metadata about the ORC file
scan - scan the ORC file
sizes - list size on disk of each column
version - print the version of this ORC tool
To get more help, provide -h to the command
$ echo "1.1299999999999991" > test.csv
$ orc-tools convert --schema "struct<amount:decimal(38,18)>" test.csv -o test.orc
$ orc-tools data test.orc
[main] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Processing data file test.orc [length: 300]
{"amount":"1.1299999999999991"}
________________________________________________________________________________________________________________________
IIUC, this is resolved completely via the following, isn't it?
I marked this as 2.1.0 and closed. Feel free to reopen this if we need to do more.
Currently, the JSON output generated by the
orc-contents
command line utility stores decimal values using floating-point numbers. This can lead to precision issues and inaccuracies, especially when dealing with financial data.test.csv:
1.1299999999999991
test.json (test.orc as JSON):{"amount": 1.129999999999999100}
Note the truncated
1
, the correct output should be1.1299999999999991
Would it be acceptable to modify Decimal128ColumnPrinter (and Decimal64ColumnPrinter) to return a string?
{"amount": "1.129999999999999100"}