cjmatta / ksql-udf-geoip

example UDF for geoip lookup
Apache License 2.0
3 stars 2 forks source link

Add extra parameter to have date #1

Open hasnat opened 5 years ago

hasnat commented 5 years ago

Given you're working on few days old data. It would be good to parameterise date string to select geoip db file for that date. getcityforip(String ip, String date)

Not sure if we change geoip path then to a folder path or a pattern e.g. ~function.getcityforip.geolite.db.path=~ getcityforip(String ip, String geoipDbFilePath) geoipDbFilePath=/Users/chris/Downloads/GeoLite2-City_20181009/GeoLite2-City.mmdb or ~function.getcityforip.geolite.db.path=~ function.getcityforip.geolite.db.path.pattern=/Users/chris/Downloads/GeoLite2-City_<DATE>/GeoLite2-City.mmdb /Users/chris/Downloads/GeoLite2-City_20181009/GeoLite2-City.mmdb getcityforip(String ip, String date) date=20181009 or ~function.getcityforip.geolite.db.path=~ function.getcityforip.geolite.db.path.prefix=/Users/chris/Downloads/ getcityforip(String ip, String geoipDbFile) geoipDbFile=GeoLite2-City_20181009/GeoLite2-City.mmdb

cjmatta commented 5 years ago

I don't think this is the right way to go about it... since that would require loading the DB for each UDF call, that would kill performance.

The config function is only called when the KSQL node loads the UDF, currently it's not able to reload a config. I've created #2102 to see about adding the ability to reconfigure a UDF without restarting this system.

hasnat commented 5 years ago

We'd might still have to load DB within udf but cache it (with some max cached dbs) for future similar date requests. ~As am not sure loading all dbs on init might not be ideal.~ As per confluentinc/ksql#2012 not sure how ideal it would be to reconfigure this udf based on items in stream.