Azure / spark-cdm-connector

MIT License
75 stars 32 forks source link

Spark 3: Cache CDM Date and DateTime type methods #95

Closed kcheeeung closed 1 year ago

kcheeeung commented 2 years ago

Cache the Date and DateTime methods by reading the first row and then mapping index to a (format, function) tuple. On a subsequent read, the executor will automatically pass in the index, and if it's a CDM Date or DateTime type, it will use the map to directly use the cached method to parse all future inputs. This increases the scalability and performance by selecting the right function to execute in constant O(1) time vs O(N) time, where N is the number of formats needed to check.

Restrictions: