facebookarchive / hive-io-experimental

Hive I/O Library
Other
66 stars 30 forks source link

Type check for HiveIO #34

Closed greg1github closed 11 years ago

greg1github commented 11 years ago

Let's add a check to verify that when we call a getLong(i) on a HiveReadableRecord, then the type column i is bigint (and not any other type, maybe int). We can add similar checks for other types. We can also add getLongDoubleMap(i) and a few more methods corresponding to typical hive cell formats that will return types. Then the caller does not have to cast (and supres warnings), and the methods will internally check that the type of the column matches the signature of the get method.

The problem is that yesterday I accidentally fed a String column to a reader that was getting Long, and the returned long was always zero, which subverted my computation. Debugging this kind of cases is hard, especially that the user (me) expects the framework to be "typed."

nitay commented 11 years ago

To be clear you want the type check in both get() and set() methods I presume? Preventing you from feeding a String into a Long column, and making sure when you call getLong() that the underlying column is of BIGINT type.

greg1github commented 11 years ago

Yes, both input and output.

greg1github commented 11 years ago

I would consider removing the get/set Object, so as to force the user into a type-safe programming.

nitay commented 11 years ago

How do you remove get()/set() yet support retrieving any arbitrary object? If anything we can replace them with just getMap() and getList() as I'm pretty sure that's the only other types that will be in there. To get the full data type though will require some generics magic. You have any ideas?

greg1github commented 11 years ago

How about we list the types for get/set we have seen so far first. Then we can decide if get/set Object is needed.