cloudera-labs / envelope

Build configuration-driven ETL pipelines on Apache Spark
Apache License 2.0
157 stars 89 forks source link

pass Config to custom HBaseSerde #28

Open andvit opened 6 years ago

andvit commented 6 years ago

Hello

com.cloudera.labs.envelope.utils.hbase.HBaseUtils // HBaseSerde util public static HBaseSerde getSerde(Config config) { HBaseSerde serde; if (config.hasPath(SERDE_PROPERTY)) { String serdeImpl = config.getString(SERDE_PROPERTY); if (serdeImpl.equals(DEFAULT_SERDE_PROPERTY)) { return new DefaultHBaseSerde(); } else { try { Class<?> clazz = Class.forName(serdeImpl); Constructor<?> constructor = clazz.getConstructor(); return (HBaseSerde) constructor.newInstance(); } catch (Exception e) { LOG.error("Could not construct custom HBaseSerde instance [" + serdeImpl + "]: " + e); throw new RuntimeException(e); } } } else { serde = new DefaultHBaseSerde(); } serde.configure(config);

return serde;

}

in case of custom serde, serde.configure(config) is never called, there is no way to access config in provided serde

according to the comment below configure(config) should be called com.cloudera.labs.envelope.utils.hbase.HBaseSerde /**

Configure the Serde. This will be passed the contents of an "input" or "output" configuration section. @param config */ void configure(Config config);

andvit commented 6 years ago

simple change from return (HBaseSerde) constructor.newInstance(); to serde = (HBaseSerde) constructor.newInstance(); resolves the issue

ianbuss commented 6 years ago

Thanks @andvit, you're absolutely right. Will ensure this is fixed and rolled into the next release. We're also missing a unit test for custom serdes. To re-iterate, the fix is as described:

Line 174 to serde = (HBaseSerde) constructor.newInstance();

Line 179 to serde = new DefaultHBaseSerde();