Esri / spatial-framework-for-hadoop

The Spatial Framework for Hadoop allows developers and data scientists to use the Hadoop data processing system for spatial data analysis.
Apache License 2.0
363 stars 160 forks source link

JSON dependencies update and Obsolete previously-deprecated classes #128

Closed randallwhitman closed 6 years ago

randallwhitman commented 6 years ago

Compatibility with Geometry v2.0 - #127 Bump version from 1.x to 2.0

randallwhitman commented 6 years ago

Java code can update, but Hive tables are stored with the InputFormat and SerDe - data persistence. Obsoleting old names ~probably needs be deferred~. Also, doc update had been deferred to pushing artifacts to Maven Central, which is stalled (#123).

randallwhitman commented 6 years ago

Not sure if it is possible to ALTER TABLE so as to change the SerDe to the newer name. If so, then that would be a reasonable migration such that we can obsolete/remove the old names.

Else it is possible to define a second Hive table on the same data, without need to copy nor delete the data. Would such be reasonable for a migration for v1 to v2, versus better to log/print a warning but defer full obsolescence post-v2.

randallwhitman commented 6 years ago

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL

ALTER TABLE table_name [PARTITION partition_spec] SET SERDE serde_class_name [WITH SERDEPROPERTIES serde_properties];

I'll try that out on some test data, and see whether that was recently added to hive versus already supported in all versions that are supported by Spatial Framework.

randallwhitman commented 6 years ago

Confirmed I read successfully from a Hive table after altering InputFormat and SeDe with: ALTER TABLE randall.counties SET FILEFORMAT INPUTFORMAT 'com.esri.json.hadoop.EnclosedEsriJsonInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' SERDE 'com.esri.hadoop.hive.serde.EsriJsonSerDe';