gisaia / ARLAS-proc

Workaround about data ingestion with computing frameworks
Apache License 2.0
4 stars 0 forks source link

Force getGeoDataUDF to be executed only once per row. #151

Closed laurent-thiebaud-gisaia closed 4 years ago

laurent-thiebaud-gisaia commented 4 years ago

Otherwise, each time the "tmpAddressColumn" is used (i.a. for the six address properties), the UDF is executed once.

laurent-thiebaud-gisaia commented 4 years ago

Another option is to declare the UDF asNonDeterministic(), but I we use the resulting columns in filters then they will not be optimized in the query plan (see https://stackoverflow.com/questions/58696198/spark-udf-executed-many-times?noredirect=1#comment103690028_58696198)