Open alamb opened 3 years ago
@alamb any idea about how to start this?
Hi @j-a-m-l, I see arrow-rs has supported Decimal
, maybe we can directly change here https://github.com/apache/arrow-datafusion/blob/22a7d74e9dece44adb790b5ea3fca27d26a2fe51/datafusion/src/sql/planner.rs#L312 cc @alamb
@j-a-m-l I think a @xudong963 has a good point that Arrow already includes some support for DecimalArrays.
A good place to start then be to write some tests in sql.rs that create a TableProvider with a column of DecimalArray
type and try to run some basic sql
The typical order I would try is something like
// projection
SELECT col + 4 ... FROM table_with_decimal_col
// filter
SELECT col FROM table_with_decimal_col where col < 5
// sort
SELECT col FROM table_with_decimal_col order by col
// grouping / aggregatie
SELECT sum(col) FROM table_with_decimal_col;
SELECT count(other_col) FROM table_with_decimal_col group by col
Writing those tests is probably a good way to figure out how to plumb through the support in DataFusion -- I would also recommend doing it in that order (projection, filter, sort, grouping / aggregation)
Thanks @xudong963 and @alamb
Is there any plan for implementation Decimal
Type for datafusion?
@alamb
Hi @j-a-m-l, do you have time to work on this?
Is there any plan for implementation
Decimal
Type for datafusion? @alamb
I want to take some tasks to support the decimal data type in the datafusion.
Is there any plan for implementation Decimal Type for datafusion?
I don't know of any plans other than what is on this ticket. As I mentioned in https://github.com/apache/arrow-datafusion/issues/122#issuecomment-956212828 I think figuring out what subtasks are needed (by writing some tests and trying out what works / doesn't work) is a good first step
No, @xudong963. Currently I don't have enough time to do that, and I'll probably focus first on fixing other issues that I've found, such as https://github.com/apache/arrow-datafusion/issues/1173.
Here are several other issues that seem to be related to DECIMAL:
Thanks for the feedback.
I will begin to support DECIMAL
datatype.
Please assign this issue to me @alamb
utils:
In order to implement decimal in datafusion, we should ref some data type rules, like spark data type and type coercion. Now, the numerical data type just support some native type and the type coercion rules are simple. if we add the decimal data type, the type checking, type cast, type promotion and precision checking are all need to be considered.
https://github.com/apache/arrow-datafusion/blob/master/datafusion/src/physical_plan/expressions/coercion.rs and https://github.com/apache/arrow-datafusion/blob/4687899957463ce81c4795a6d35d31320db0252b/datafusion/src/physical_plan/type_coercion.rs are the two places I know of in DataFusion today that do this type coercion
On Mon, Nov 22, 2021 at 9:33 PM Kun Liu @.***> wrote:
In order to implement decimal in datafusion, we should ref some data type rules, like spark data type https://spark.apache.org/docs/3.2.0/sql-ref-datatypes.html and type coercion https://spark.apache.org/docs/3.2.0/sql-ref-ansi-compliance.html#type-promotion-and-precedence . Now, the numerical data type just support some native type and the type coercion rules are simple. if we add the decimal data type, the type checking, type cast, type promotion and precision checking are all need to be considered.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/apache/arrow-datafusion/issues/122#issuecomment-976109767, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADXZMNH25DW2NQZKCQ75BDUNL4OBANCNFSM43S4FY3Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
I know of in DataFusion today that do this type coercion
yep, Now I just try to support some simple demo code for decimal. For example function(min,max) and just Projection the column with decimal data type (without any operation and functions)
The rule of conversion in the above two files should be refactored, If we want to align the behavior with PG database exactly.
In the PG database, decimal and other numerical data type can be convert to each other, and in some case the varchar also can be convert to a decimal data type. In my opinion, we can support some simple and right rules first, and support other complex features later. For example, In the add
mathematics operation for the decimal datatype, if left or right is the decimal datatype, we must require that two size are all decimal type. It is means that we don't support the cast or try cast
to or from decimal data type implicitly.
What do you think about? @alamb
In my opinion, we can support some simple and right rules first, and support other complex features later
I think this is a great idea -- taking stock / consolidating the existing coercion logic might be a good place to start (before you try to extend it to support Decimal)
Note: migrated from original JIRA: https://issues.apache.org/jira/browse/ARROW-10818
The TPC-H benchmarks correctly specify that all MONEY columns are DECIMAL type (precision and scale are not specified). We currently use
DataType::Float64
which is much lighter than a true Decimal type.To be a valid benchmark we need to ensure we support the same precision as the reference implementation.