51zero / eel-sdk

Big Data Toolkit for the JVM
Apache License 2.0
145 stars 35 forks source link

Throw a meaningful exception on unsupported data types in ORC #366

Closed stefanobaghino closed 6 years ago

stefanobaghino commented 6 years ago

I've encountered an error when moving data from a JDBCSource to an ORC-backed HiveSink.

I've manually created a Hive table to host the data contained in the original source. It may be that the semantics of bigint are different for Teradata (DECIMAL(n, n)) and Hive/ORC (LONG), however the problem is that whenever a BigIntType is encountered, a MatchError is thrown as this is not among the supported data types for ORC.

I solved my problem by patching the code as in this PR. It's very rough and of course I'm available to improve on this, should this be of interest for a review.

It's worth mentioning that along the way I noticed that probably the right tool to use would have been an ad-hoc implementation of MetastoreSchemaHandler suited for my use case. It's the first time I used Eel and unfortunately this came up later during my experimentation.

If that's the case, feel free to close this.

hannesmiller commented 6 years ago

Hi Stefan,

It looks to me that Teradata Decimals should map to JDBC decimal types (i.e. java.math.Decimal). Therefore it should flow into the HiveSink as a decimal, however I don't think your HIVE table should be defined as bigint which is the same as a long in the EEL type system?

The BigIntType in the EEL type system is more like a BigDecimal without a scale.

Can you confirm what shows up as the JDBC type for the column in the JdbcSource or just write a little JDBC test program with same query and examine the types coming through in the debugger?

Alternatively you could set a breakpoint on the HiveSink at io.eels.component.hive.HiveSinkWriter#write to examine the EEL row as well.

stefanobaghino commented 6 years ago

Yes, I probably just got the schema creation wrong.

Field([REDACTED],DecimalType(Precision(18),Scale(0)),false,false,None,false,null,Map(),None)

Still, do you think it would be a nice addition to add a more telling message rather then just having a MatchError thrown when a BigInt is encountered?

hannesmiller commented 6 years ago

Absolutely we can add a sys.error with a more meaningful error message on a no match - this is very simple to fix.

I will keep this ticket open for this.

We might push this out early on a a11 release though we are planning to rollout a 1.3 official release in the next few weeks if everything goes well.

stefanobaghino commented 6 years ago

Is there some kind of channel that can be used to get in touch with the development team? A mailing list or a Gitter channel or something like that? I like this library, perhaps there's something I can contribute.

stefanobaghino commented 6 years ago

I've reworked this PR to go in the direction we discussed, I hope it's helpful. Let me know if the project is open for contribution. 🙂

Is there some kind of channel that can be used to get in touch with the development team? A mailing list or a Gitter channel or something like that? I like this library, perhaps there's something I can contribute.

garyfrost commented 6 years ago

@stefanobaghino thanks for this contribution, will merge.

I think a gitter channel is a great idea, I'll set it up.

There are a couple of issues and we'd love your contributions, we can chat more about it when I set up the gitter channel if you like.

Gary

garyfrost commented 6 years ago

@stefanobaghino Gitter channel is up - https://gitter.im/eel-sdk/Lobby

stefanobaghino commented 6 years ago

Awesome, thanks for acting on my suggestion!