Closed dongsilun closed 1 year ago
create external table hive2doris (id int, str1 string, str2 string, str3 string , str4 string, str5 string, str6 string, str7 string , str8 string, str9 string, str10 string, num1 int , num2 int, num3 int, num4 int, num5 int, num6 int , num7 int, num8 int, num9 int) stored as orc location '/hive_test/hive2doris';
CREATE TABLE hive2doris (id int(10) NOT NULL, str1 varchar(100) NOT NULL,str2 varchar(100) NOT NULL,.........,num8 int(10) NOT NULL,num9 int(10) NOT NULL ) ENGINE=OLAP DUPLICATE KEY(id, str1) COMMENT 'OLAP' DISTRIBUTED BY HASH(id) BUCKETS 10 PROPERTIES ( "replication_allocation" = "tag.location.default: 3" );
@dongsilun Need to add
doris.config = {
format="json"
read_json_by_line="true"
}
the format is mandatory
Thank you.
Search before asking
What happened
Create a test batch job [hive -> doris]. It looks like there's a Null Pointer Exception at the DorisSinkWriter.
2023-08-01 17:08:30,489 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0) (myhost executor 1): java.lang.NullPointerException at org.apache.seatunnel.connectors.doris.sink.writer.DorisSinkWriter.createSerializer(DorisSinkWriter.java:271) at org.apache.seatunnel.connectors.doris.sink.writer.DorisSinkWriter.<init>(DorisSinkWriter.java:94) at org.apache.seatunnel.connectors.doris.sink.DorisSink.createWriter(DorisSink.java:103) at org.apache.seatunnel.translation.spark.sink.write.SeaTunnelSparkDataWriterFactory.createWriter(SeaTunnelSparkDataWriterFactory.java:49) at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:407) at org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.$anonfun$writeWithV2$2(WriteToDataSourceV2Exec.scala:358) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:131) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1491) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
SeaTunnel Version
2.3.2
SeaTunnel Config
Running Command
Error Exception
Zeta or Flink or Spark Version
Spark-3.2.2
Java or Scala Version
jdk1.8
Screenshots
Are you willing to submit PR?
Code of Conduct