cnuernber / dtype-next

A Clojure library designed to aid in the implementation of high performance algorithms and systems.
Other
319 stars 18 forks source link

casting: Add support for :decimal datatypes #70

Closed sundbry closed 1 year ago

sundbry commented 1 year ago

Registers the :decimal object datatype. I was surprised how little code this took, but it seems to work without issue reading parquet files with :decimal values.

cnuernber commented 1 year ago

I love it :-).

cnuernber commented 1 year ago

I have a related question - Do you need the accuracy of the decimal type in the parquet file or is that type a side effect of some upstream processing?

Specifically - are you doing the type of work that requires more floating point accuracy than a 64 bit double?

sundbry commented 1 year ago

Hi @cnuernber, no, we are not doing any floating point work, just using tech.ml.dataset to load parquet files from Snowflake. Our app takes a Snowflake dataset and creates a GraphQL API out of it. However it is important that the data we serve is high fidelity to the original user data - so if the user defines a DECIMAL type, we need to keep it as a decimal type throughout the pipeline.

cnuernber commented 1 year ago

Totally agreed we need to not change datatypes especially as middle layers.

Have you looked into duckdb? We have some experimental bindings to it, it is a vectorized desktop database which seems to me to be complementary to snowflake systems.

sundbry commented 1 year ago

I appreciate the recommendation, duckdb seems worth learning about. I just read your blog page on JVM FFI - wow, the possibilities with that are incredible! I understand better how the I/o works here. Thanks!