delta-io / kafka-delta-ingest

A highly efficient daemon for streaming data from Kafka into Delta Lake
Apache License 2.0
337 stars 72 forks source link

Add coercions module for implicit data type coercions #101

Closed xianwill closed 2 years ago

xianwill commented 2 years ago

Starting as DRAFT for now. Needs integration tests and some additional robustness.

This PR adds a coercions module for implicitly coercing values along the lines of what Spark's from_json function provides. Hooks are provided with this PR, but we are only implementing coercions that are relevant to our own deployments at the moment.

The two that are biting us right now are:

I implemented this separately from the transforms module because the intention is quite different and that module could use some cleanup. Ultimately, I could see us potentially getting rid of the dependency on the jmespath crate, extracting the CoercionTree into a more general data structure that can be used for a single pass walk over a deserialized serde_json::Value to do transforms and coercions together but that's for another time.