Spuul / hive-udfs

Collection of Hive UDFs
5 stars 11 forks source link

Can't load GeoLite2-City.mmdb from hdfs #3

Open mshirley opened 8 years ago

mshirley commented 8 years ago

I'd like to load the GeoLite2-City.mmdb file from HDFS but hive-udfs can't read it because it's not clear what the file path is. The only way I can get it to work is execute 'list files', copy the tmp directory location then use that in the function.

hive> ADD jar hdfs:///resources/jars/hive-geoip-udf-0.1-SNAPSHOT.jar;                                                                                                        [4/1829]
converting to local hdfs:///resources/jars/hive-geoip-udf-0.1-SNAPSHOT.jar
Added [/tmp/0fd54f8d-e3eb-4cfe-823f-8d1a0ce7c13a_resources/hive-geoip-udf-0.1-SNAPSHOT.jar] to class path
Added resources: [hdfs:///resources/jars/hive-geoip-udf-0.1-SNAPSHOT.jar]

hive> ADD FILE hdfs:///resources/data/geoip/GeoLite2-City.mmdb;
converting to local hdfs:///resources/data/geoip/GeoLite2-City.mmdb
Added resources: [hdfs:///resources/data/geoip/GeoLite2-City.mmdb]

hive> CREATE TEMPORARY FUNCTION geoip as 'com.spuul.hive.GeoIP2';
OK
Time taken: 0.537 seconds

hive> select geoip('8.8.8.8', 'CITY', 'GeoLite2-City.mmdb');
OK
Time taken: 1.258 seconds, Fetched: 1 row(s)

hive> select geoip('8.8.8.8', 'CITY', './GeoLite2-City.mmdb');
OK
Time taken: 0.165 seconds, Fetched: 1 row(s)

hive> list files;
/tmp/0fd54f8d-e3eb-4cfe-823f-8d1a0ce7c13a_resources/GeoLite2-City.mmdb

hive> select geoip('8.8.8.8', 'CITY', '/tmp/0fd54f8d-e3eb-4cfe-823f-8d1a0ce7c13a_resources/GeoLite2-City.mmdb');
OK
Mountain View
Time taken: 0.253 seconds, Fetched: 1 row(s)
DanielMuller commented 8 years ago

Seems that the path of the database file given in the geoip method is relative to the path of the JAR. I am no expert in this, I only used the files on the local file system. Both in root folder of the user launching hive.

weiatwork commented 6 years ago

From the API it looks like it only supports reading from local file. HDFS is not supported. https://github.com/maxmind/GeoIP2-java