Closed sundbry closed 1 year ago
I love it :-).
I have a related question - Do you need the accuracy of the decimal type in the parquet file or is that type a side effect of some upstream processing?
Specifically - are you doing the type of work that requires more floating point accuracy than a 64 bit double?
Hi @cnuernber, no, we are not doing any floating point work, just using tech.ml.dataset to load parquet files from Snowflake. Our app takes a Snowflake dataset and creates a GraphQL API out of it. However it is important that the data we serve is high fidelity to the original user data - so if the user defines a DECIMAL type, we need to keep it as a decimal type throughout the pipeline.
Totally agreed we need to not change datatypes especially as middle layers.
Have you looked into duckdb? We have some experimental bindings to it, it is a vectorized desktop database which seems to me to be complementary to snowflake systems.
I appreciate the recommendation, duckdb seems worth learning about. I just read your blog page on JVM FFI - wow, the possibilities with that are incredible! I understand better how the I/o works here. Thanks!
Registers the :decimal object datatype. I was surprised how little code this took, but it seems to work without issue reading parquet files with :decimal values.