[SUPPORT] Too slow while using trino-hudi connector while querying partitioned tables.

BruceKellan commented 1 year ago

Describe the problem you faced

We are testing the Embedded Hudi Connector on copy-on-write table using trino-405 (latest stable version), but we ran into serious performance problem. We will have a very large number of partitions in a table and we made a minimal test set for this.

To Reproduce

Steps to reproduce the behavior:

test table data: hudi_reproduce.tar.gz desc: This test table has many partitions and parititoined by day, type. There are 657 data in total.

Import data and run a hiveql to repair partitions.


CREATE EXTERNAL TABLE `website.hudi_reproduce`(
`_hoodie_commit_time` string,
`_hoodie_commit_seqno` string,
`_hoodie_record_key` string,
`_hoodie_partition_path` string,
`_hoodie_file_name` string,
`uniquekey` string)
PARTITIONED BY (
`day` bigint,
`type` bigint)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
WITH SERDEPROPERTIES (
'hoodie.query.as.ro.table'='false',
'path'='hdfs://xxx/hudi/warehouse/hudi_reproduce')
STORED AS INPUTFORMAT
'org.apache.hudi.hadoop.HoodieParquetInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
'hdfs://xxx/hudi/warehouse/hudi_reproduce'
TBLPROPERTIES (
'last_commit_time_sync'='20230111113655773',
'last_modified_time'='1673406649',
'spark.sql.sources.provider'='hudi',
'spark.sql.sources.schema.numPartCols'='2',
'spark.sql.sources.schema.numParts'='1',
'spark.sql.sources.schema.part.0'='{"type":"struct","fields":[{"name":"_hoodie_commit_time","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_commit_seqno","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_record_key","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_partition_path","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_file_name","type":"string","nullable":true,"metadata":{}},{"name":"uniqueKey","type":"string","nullable":true,"metadata":{}},{"name":"day","type":"long","nullable":true,"metadata":{}},{"name":"type","type":"long","nullable":true,"metadata":{}}]}',
'spark.sql.sources.schema.partCol.0'='day',
'spark.sql.sources.schema.partCol.1'='type',
'transient_lastDdlTime'='1673406649');

-- repair partitions. msck repair table website.hudi_reproduce;


2. Run trino sql to query:
```sql
-- we want to query the data that type was between 1 and 9 and day between 20230101 and 20230104
select count(1) from hudi.website.hudi_reproduce where day between 20230101 and 20230104 and type between 1 and 9;

Query too slow:

Expected behavior

Can query as fast as hive table.

Environment Description

Hudi version : 0.12.2
Hive version : 2.3.9
Hadoop version : 2.8.5
Trino version: 405
Number of trino worker: 8
Storage (HDFS/S3/GCS..) : HDFS
Running on Docker? (yes/no) : no

Additional context

Share our trino server.log, hope this helps you. hudi_reproduce_trino_server_log.log

BruceKellan commented 1 year ago

@codope Can you help me

danny0405 commented 1 year ago

Kind of remember there are some issues with Trino, it's like the partitions/files list are queried many times, @codope may give more details here.

alexeykudinkin commented 1 year ago

@BruceKellan we addressed performance regression in 0.12.2. Can you give it a try?

BruceKellan commented 1 year ago

@alexeykudinkin Maybe I didn't understand it accurately enough, do you mean to use the latest master of trino?

BruceKellan commented 1 year ago

@BruceKellan we addressed performance regression in 0.12.2. Can you give it a try?

I upgrade the version of hudi in trino-hudi to 0.12.2 and recompiled, but it's also too slow. We try to locate the problem by enabling debug level logging and get some information. BTW, our trino version is 405.

While running trino sql to query:

select count(1) from hudi.website.hudi_reproduce where day between 20230101 and 20230104 and type between 1 and 9;

The hudi connector also get all paritions from hive-metastore. Maybe this is the reason.

server.log As can be seen from the log, the operation of getting partition from the metastore cost 12 seconds.

@alexeykudinkin @codope WDYT?

BruceKellan commented 1 year ago

The entire query took 15 seconds, and the step of fetching the partitions took 12 seconds, it seem to not be the expected behavior.

danny0405 commented 1 year ago

The regression should be a blocker for release 0.13.0, have created a JIRA： https://issues.apache.org/jira/browse/HUDI-5552

BruceKellan commented 1 year ago

The regression should be a blocker for release 0.13.0, have created a JIRA： https://issues.apache.org/jira/browse/HUDI-5552

Thanks danny! I have provided reproduction steps, and a minimal data set I made, if you need anything else, please ping me. I read the rfc related to trino-hudi-connector, But I still find this behavior of fetching partitions weird.

alexeykudinkin commented 1 year ago

@BruceKellan can you elaborate what you're measuring this performance against? Is it Hive?

You're using Hudi's Hive connector in Trino, right?

BruceKellan commented 1 year ago

@alexeykudinkin

Is it Hive?

No, it's Trino.

can you elaborate what you're measuring this performance against?

On the one hand, the amount of data is very small, and on the other hand, with the same amount of data in the Hive table instead of the Hudi table, each query only takes 2 seconds, and if it is a Hudi table, it takes 15 seconds, so this confuses me

You're using Hudi's Hive connector in Trino, right?

No, I'm using Trino's embedded hudi connector, not the hive connector.

BruceKellan commented 1 year ago

BTW, if running the query through trino-hive-connector and hudi-hadoop-mr-bundle.jar, it only takes 2.x seconds.

alexeykudinkin commented 1 year ago

@BruceKellan thanks for the detailed context! This is very helpful

cc @yihua

codope commented 1 year ago

@BruceKellan Thanks for your patience. I was away on a break and didn't get a chance to look into this. First of all, I don't think it's a regression as the query is slow even with hudi connector of Trino version 400 using Hudi 0.11.1. There was a regression in hive connector due to a change in hudi code and we have fixed that in master.

Now, in my setup of hudi connector, I found that the query is slow because there is single split manager thread doing all the listing. It's also evident in your setup (hudi-split-manager-0). This is quite inefficient. I need to improve this, do more like how hive connector's background split loader works. This is a change in Trino codebase and not Hudi. I will work on it next week.

BruceKellan commented 1 year ago

@codope Thank you for your work! I am also looking forward to seeing a powerful hudi connector. Also, I have an additional question, do you think the get all partitions appearing in the logs is a normal behavior?

codope commented 1 year ago

@BruceKellan I have a working patch with significant performance gains. On your table, i could see 50-60% latency reduction. https://github.com/codope/trino/pull/23 Can you try above patch? Let me know if you have trouble building, then I can share the trino-server tarball with you. I need to make a few minor changes before I can raise a PR against the Trino repo. But, early feedback from you would helpful.

BruceKellan commented 1 year ago

@codope ok. I will try it this week.

apache / hudi

[SUPPORT] Too slow while using trino-hudi connector while querying partitioned tables. #7643