datasalt / splout-db

A web-latency SQL spout for Hadoop.
50 stars 14 forks source link

TableBuilder should be more flexible and possibly allow for implicit Schemas #19

Closed pereferrera closed 11 years ago

pereferrera commented 11 years ago

Current TableBuilder is nice but it only allows creating Tables from text or from binary Pangool Tuple files. It is not possible to use it with arbitrary InputFormats. As we are currently working in making Splout integrate easily with the Hadoop eco-system (Hive, Pig, Cascading) we should make TableBuilder more flexible and also add the possibility of implicit Schemas, so that if no Schema is provided it is read from the Tuples of the InputFormat.

The problem here is that many methods rely on the previous Schema for assuring things like partitionBy fields are coherent. So the question is whether we should sample a Path to obtain the Schema first and perform the same checking or just not perform the checkings.

pereferrera commented 11 years ago

This is now solved in trunk.