Closed LXHING999 closed 3 years ago
@LXHING999 , could you, please, provide: 1.) DDL for table in Hive/Presto 2.) Version of presto JDBC
presto : presto-jdbc-310.jar
ddl:
CREATE TABLE dw_livecast.bak_bdm_mongo_exam
(
id
string,
subject_id
string,
paper_id
string,
name
string,
detail
string,
exam_type
string,
region_level
string,
regions
string,
clazz_levels
string,
school_level
string,
school_ids
string,
apply_start_at
string,
apply_stop_at
string,
exam_start_at
string,
exam_stop_at
string,
correct_stop_at
string,
result_issue_at
string,
duration_minutes
string,
submit_after_minutes
string,
question_sort
string,
status
string,
agent_name
string,
agent_code
string,
file_url
string,
file_name
string,
creator_id
string,
created_at
string,
updated_at
string,
group_id
string,
teacher_id
string,
deleted_at
string,
papers
string,
ranks
string,
dt
string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
'hdfs://M7-10-6-0-87:8020/user/hive/warehouse/dw_livecast.db/bak_bdm_mongo_exam'
TBLPROPERTIES (
'COLUMN_STATS_ACCURATE'='true',
'numFiles'='42',
'numRows'='84883593',
'rawDataSize'='256133063360',
'totalSize'='2311198166',
'transient_lastDdlTime'='1557899913')
i fix it by rewrite ClickHouseRowBinaryStream.java , commit Preconditions.checkNotNull(string); here is code : public void writeString(String string) throws IOException { // Preconditions.checkNotNull(string); if (null==string){ byte[] bytes = "".getBytes(StreamUtils.UTF_8); writeUnsignedLeb128(bytes.length); out.write(bytes); }else { byte[] bytes = string.getBytes(StreamUtils.UTF_8); writeUnsignedLeb128(bytes.length); out.write(bytes); } }
use
CREATE TABLE IF NOT EXISTS gdm_user_student_purchase_course_history_daily ENGINE = MergeTree order by tuple() as
SELECT * FROM jdbc('jdbc:presto://10.7.8.83:8921/hive?user=1', 'dw_livecast', 'gdm_user_student_purchase_course_history_daily');
is slow !
90 million row need 40 mins;
CREATE TABLE dw_livecast.gdm_user_student_purchase_course_history_daily
(
student_id
string COMMENT '学生ID',
low_course_first_pay_datetime
string COMMENT '低价课首次购买时间',
low_course_first_order_id
string COMMENT '低价课首次购买订单ID',
low_course_latest_pay_datetime
string COMMENT '低价课最近一次购买时间',
low_course_latest_order_id
string COMMENT '低价课最近一次购买订单ID',
low_course_buy_count
string COMMENT '低价课购买数量',
normal_course_first_pay_datetime
string COMMENT '正价课首次购买时间',
normal_course_first_order_id
string COMMENT '正价课首次购买订单ID',
normal_course_latest_pay_datetime
string COMMENT '正价课最近一次购买时间',
normal_course_latest_order_id
string COMMENT '正价课最近一次购买订单ID',
normal_course_buy_count
string COMMENT '正价课购买数量',
course_first_pay_datetime
string COMMENT '所有类型课首次购买时间',
course_first_order_id
string COMMENT '所有类型课首次购买订单ID',
course_latest_pay_datetime
string COMMENT '所有类型课最近一次购买时间',
course_latest_order_id
string COMMENT '所有类型课最近一次购买订单ID',
course_buy_count
string COMMENT '所有类型课购买数量')
PARTITIONED BY (
dt
string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://M7-10-6-0-87:8020/user/hive/warehouse/dw_livecast.db/gdm_user_student_purchase_course_history_daily'
TBLPROPERTIES (
'transient_lastDdlTime'='1552451362')
@LXHING999 , and if just select from the table from cli clickhouse-client command?
SELECT * FROM jdbc('jdbc:presto://10.7.8.83:8921/hive?user=1', 'dw_livecast', 'gdm_user_student_purchase_course_history_daily') FORMAT Null
How much time does it takes?
Could you try lastest code on master branch? It should work.
Feel free to reopen if you still see the issue. I'm also interested in the performance issue, will it be better if you issue multiple queries each taking care of a partition of the dataset?
2019-05-15 14:37:31,804 [ HTTP Handler-27 ] {QueryHandlerServlet} - query is SELECT "id", "subject_id", "paper_id", "name", "detail", "exam_type", "region_level", "regions", "clazz_levels", "school_level", "school_ids", "apply_start_at", "apply_stop_at", "exam_start_at", "exam_stop_at", "correct_stop_at", "result_issue_at", "duration_minutes", "submit_after_minutes", "question_sort", "status", "agent_name", "agent_code", "file_url", "file_name", "creator_id", "created_at", "updated_at", "group_id", "teacher_id", "deleted_at", "papers", "ranks", "dt" FROM "dw_livecast"."bak_bdm_mongo_exam"
2019-05-15 14:37:32,351 [ HTTP Handler-27 ] {QueryHandlerServlet} -
java.lang.NullPointerException
at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:212)
at ru.yandex.clickhouse.util.ClickHouseRowBinaryStream.writeString(ClickHouseRowBinaryStream.java:72)
at ru.yandex.clickhouse.jdbcbridge.db.clickhouse.ClickHouseConverter.lambda$static$13(ClickHouseConverter.java:58)
at ru.yandex.clickhouse.jdbcbridge.db.clickhouse.ClickHouseFieldSerializer.serialize(ClickHouseFieldSerializer.java:34)
at ru.yandex.clickhouse.jdbcbridge.db.clickhouse.ClickHouseRowSerializer.serialize(ClickHouseRowSerializer.java:22)
at ru.yandex.clickhouse.jdbcbridge.servlet.QueryHandlerServlet.doPost(QueryHandlerServlet.java:60)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:865)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1655)
at ru.yandex.clickhouse.jdbcbridge.servlet.RequestLogger.doFilter(RequestLogger.java:32)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)
at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:205)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at org.eclipse.jetty.server.Server.handle(Server.java:503)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:364)
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)
at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produce(EatWhatYouKill.java:132)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:765)
at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:683)
at java.lang.Thread.run(Thread.java:748)