51zero / eel-sdk

Big Data Toolkit for the JVM
Apache License 2.0
145 stars 35 forks source link

A frame consisting of CharType(255) doesn't map to anything in Hive/Parquet sink #204

Closed hannesmiller closed 7 years ago

hannesmiller commented 7 years ago

The method io.eels.component.parquet.ParquetSchemaFns.toParquetType doesn't handle Frames with a CharType and hence the following exception:

scala.MatchError: CharType(255) (of class io.eels.schema.CharType) at io.eels.component.parquet.ParquetSchemaFns$.toParquetType(ParquetSchemaFns.scala:72) at io.eels.component.parquet.ParquetSchemaFns$$anonfun$2.apply(ParquetSchemaFns.scala:110) at io.eels.component.parquet.ParquetSchemaFns$$anonfun$2.apply(ParquetSchemaFns.scala:110) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.Iterator$class.foreach(Iterator.scala:893) at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at io.eels.component.parquet.ParquetSchemaFns$.toParquetSchema(ParquetSchemaFns.scala:110) at io.eels.component.parquet.ParquetWriterFn$.apply(ParquetWriterFn.scala:30) at io.eels.component.parquet.ParquetSink$$anon$1.(ParquetSink.scala:16) at io.eels.component.parquet.ParquetSink.writer(ParquetSink.scala:14) at io.eels.actions.SinkAction$.execute(SinkAction.scala:10)

sksamuel commented 7 years ago

I've mapped it to Strings. The reason I've done that rather than FIXED_LEN_BYTE_ARRAY is because if we use FIXED_LEN_BYTE_ARRAY, will hive or impala know to interpret it as a string if we query it? Eel would know, but I'm not sure other libraries would.

hannesmiller commented 7 years ago

Yeah I can live with that 😄