Closed hannesmiller closed 7 years ago
It must be because orc 1.3.0 is now used. It might use a newer hadoop.
On 30 Jan 2017 7:02 a.m., "hannesmiller" notifications@github.com wrote:
The following code:
def JdbcSinkToOrc: Unit = { implicit val hadoopConfiguration = new Configuration() implicit val hadoopFileSystem = FileSystem.get(hadoopConfiguration) // Write to a OrcSink from a JDBCSource val query = "SELECT NAME, AGE, SALARY, CREATION_TIME FROM PERSON" val orcFilePath = new Path("hdfs://gcstrd03.de.db.com/client/eel/person.orc") if (hadoopFileSystem.exists(orcFilePath)) hadoopFileSystem.delete(orcFilePath, true) JdbcSource(() => dataSource.getConnection, query).withFetchSize(10) .toFrame.to(OrcSink(orcFilePath)) }
Produces the follow stack trace:
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.hive.serde2.io.HiveDecimalWritable.
(J)V at org.apache.orc.impl.ColumnStatisticsImpl$DecimalStatisticsImpl. (ColumnStatisticsImpl.java:789) at org.apache.orc.impl.ColumnStatisticsImpl.create(ColumnStatisticsImpl.java:1384) at org.apache.orc.impl.WriterImpl$TreeWriter. (WriterImpl.java:511) at org.apache.orc.impl.WriterImpl$DecimalTreeWriter. (WriterImpl.java:1977) at org.apache.orc.impl.WriterImpl.createTreeWriter(WriterImpl.java:2526) at org.apache.orc.impl.WriterImpl.access$1600(WriterImpl.java:95) at org.apache.orc.impl.WriterImpl$StructTreeWriter. (WriterImpl.java:2074) at org.apache.orc.impl.WriterImpl.createTreeWriter(WriterImpl.java:2529) at org.apache.orc.impl.WriterImpl. (WriterImpl.java:191) at org.apache.orc.OrcFile.createWriter(OrcFile.java:671) at io.eels.component.orc.OrcWriter.writer$lzycompute(OrcWriter.scala:53) at io.eels.component.orc.OrcWriter.writer(OrcWriter.scala:53) at io.eels.component.orc.OrcWriter.flush(OrcWriter.scala:83) at io.eels.component.orc.OrcWriter.close(OrcWriter.scala:94) at io.eels.component.orc.OrcSink$$anon$1.close(OrcSink.scala:35) at io.eels.actions.SinkAction$.execute(SinkAction.scala:19) at io.eels.Frame$class.to(Frame.scala:345) at io.eels.SourceFrame.to(SourceFrame.scala:17) at io.eels.testing.EelSourceToSinks$.JdbcSinkToOrc(EelSourceToSinks.scala:93) at io.eels.testing.EelSourceToSinks$.delayedEndpoint$io$eels$testing$EelSourceToSinks$1(EelSourceToSinks.scala:56) at io.eels.testing.EelSourceToSinks$delayedInit$body.apply(EelSourceToSinks.scala:23) at scala.Function0$class.apply$mcV$sp(Function0.scala:34) at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12) at scala.App$$anonfun$main$1.apply(App.scala:76) at scala.App$$anonfun$main$1.apply(App.scala:76) at scala.collection.immutable.List.foreach(List.scala:381) at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35) at scala.App$class.main(App.scala:76) — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/sksamuel/eel-sdk/issues/234, or mute the thread https://github.com/notifications/unsubscribe-auth/AAtZGqfStt6cRTR6EpC7lAyNkfKbW_KMks5rXYsRgaJpZM4LxGP- .
Do you think we should downgrade?
Yes I think so as upgrading runs the risk of other source and sinks breaking.
Let me take a look at what is causing the problem. I assume it is a conflict with the version of Hive.
Ok, it was the change that Matt put into Hive's storage api to speed up decimals (HIVE-15335). It looks like I'll need to roll a new release of hive-storage-api and ORC.
Thanks @omalley , we'll upgrade once its released.
In the interim I still think we should downgrade until the new release of hive-storage-api for ORC is available.
Path orcFilePath = new Path("hdfs://nameservice1/client/eel_java/person.orc");
Configuration hadoopConfiguration = new Configuration();
FileSystem hadoopFileSystem = FileSystem.get(hadoopConfiguration);
if (hadoopFileSystem.exists(orcFilePath)) hadoopFileSystem.delete(orcFilePath, true);
// Build schema
StructType schema = StructTypeBuilder.builder()
.withField("Name", Types4j.StringType)
.withField("Age", Types4j.IntSignedType)
.build();
// Create rows and write them
OrcSink4j orcSink = new OrcSink4j(orcFilePath);
RowBuilder.Builder rowBuilder = RowBuilder.builder(schema);
List<Row> rows = Arrays.asList
(
rowBuilder.add("Fred", 21).build(),
rowBuilder.reset().add("John", 28).build(),
rowBuilder.reset().add("Alice", 17).build()
);
orcSink.write(rows);
// Read and display
OrcSource4j orcSource = new OrcSource4j(orcFilePath);
orcSource
.toFrame()
.toList().forEach(System.out::println);
Exception in thread "pool-5-thread-1" java.lang.NoSuchMethodError: org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch.getDataColumnCount()I
at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1096)
at io.eels.component.orc.OrcBatchIterator$$anon$1.hasNext(OrcBatchIterator.scala:56)
at io.eels.CloseableIterator$$anon$1.hasNext(CloseableIterator.scala:10)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at io.eels.CloseableIterator$$anon$1.foreach(CloseableIterator.scala:9)
at io.eels.CloseableIterator$class.foreach(CloseableIterator.scala:18)
at io.eels.component.orc.OrcPart$$anon$1.foreach(OrcSource.scala:48)
at io.eels.SourceFrame$$anonfun$rows$1$$anonfun$apply$1.apply$mcV$sp(SourceFrame.scala:32)
at io.eels.SourceFrame$$anonfun$rows$1$$anonfun$apply$1.apply(SourceFrame.scala:30)
at io.eels.SourceFrame$$anonfun$rows$1$$anonfun$apply$1.apply(SourceFrame.scala:30)
at scala.util.Try$.apply(Try.scala:192)
at com.sksamuel.exts.concurrent.ExecutorImplicits$RichExecutorService$$anon$2.run(ExecutorImplicits.scala:28)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
It looks like I'll need:
That will give you a set of versions that work together. I'm sorry for the issue.
No problem @omalley - thanks for the prompt response.
Sam, I think this is doable... I don't think the exclusions/inclusions of the Artefacts in SBT are going to impact the Hive Source and Sinks for other dialects right?
If we could get an M2 in central...I'll test it ASAP.
Cheers, Hannes
Apache releases take 3 days, so it will be early next week for the new releases.
No problem
Ok, the storage-api and orc releases have been made and pushed to Maven central. Here's a patch that updates the dependencies. https://github.com/omalley/eel-sdk/tree/orc-upgrade
I didn't realize that you were using the older 1.2 version of Hive, so there are some compilation problems in your hive module that I don't have time to track down.
I think we should bump our version of Hive to something more modern @hannesmiller
The patch I included bumps it to Hive 2.1.1, which is the current. That is why your hive module has issues.
Thanks @omalley