Azure / spark-cdm-connector

MIT License
76 stars 33 forks source link

Spark 3: Cache CDM Date and DateTime type methods #100

Closed kcheeeung closed 2 years ago

kcheeeung commented 2 years ago

Cache the Date and DateTime methods by reading the first row and then mapping index to a (format, function) tuple. On a subsequent read, the executor will automatically pass in the index, and if it's a CDM Date or DateTime type, it will use the map to directly use the cached method to parse all future inputs. This increases the scalability and performance by selecting the right function to execute in constant O(1) time vs O(N) time, where N is the number of formats needed to check.

Restrictions:

ghost commented 2 years ago

CLA assistant check
Thank you for your submission, we really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

:x: kcheeeung sign now
You have signed the CLA already but the status is still pending? Let us recheck it.