datafuselabs / databend

𝗗𝗮𝘁𝗮, 𝗔𝗻𝗮𝗹𝘆𝘁𝗶𝗰𝘀 & 𝗔𝗜. Modern alternative to Snowflake. Cost-effective and simple for massive-scale analytics. https://databend.com
https://docs.databend.com
Other
7.67k stars 727 forks source link

bug: Inner Join Fails with OOM or Index Out of Range Despite Settings #16314

Closed inviscid closed 1 week ago

inviscid commented 3 weeks ago

Search before asking

Version

v1.2.618-nightly

What's Wrong?

We found an issue with Inner Join operation completing for a join between two relatively small tables. (1 million & 4 million). We have 3 query pods with 3 CPUs and 24GB RAM. We kept getting OOM even when setting the max_memory_usage to 10GB. We finally set it down to 2.4GB and this kept it from OOM but then we received the following error:

64f993dc-1f5c-4ad9-8e34-6c71edcd5cbf 2024-08-16T14:39:46.861619Z  WARN databend_query::pipelines::executor::processor_async_task: processor_async_task.rs:146 Very slow processor async task, query_id:"64f993dc-1f5c-4ad9-8e34-6c71edcd5cbf", processor id: NodeIndex(17), name: "HashJoinProbe", elapsed: 5.000897638s, active sync workers: 0
64f993dc-1f5c-4ad9-8e34-6c71edcd5cbf 2024-08-16T14:39:46.861648Z  WARN databend_query::pipelines::executor::processor_async_task: processor_async_task.rs:146 Very slow processor async task, query_id:"64f993dc-1f5c-4ad9-8e34-6c71edcd5cbf", processor id: NodeIndex(16), name: "HashJoinProbe", elapsed: 5.001075858s, active sync workers: 0
64f993dc-1f5c-4ad9-8e34-6c71edcd5cbf 2024-08-16T14:39:46.861848Z  WARN databend_query::pipelines::executor::processor_async_task: processor_async_task.rs:146 Very slow processor async task, query_id:"64f993dc-1f5c-4ad9-8e34-6c71edcd5cbf", processor id: NodeIndex(23), name: "HashJoinProbe", elapsed: 5.000906638s, active sync workers: 0
64f993dc-1f5c-4ad9-8e34-6c71edcd5cbf 2024-08-16T14:39:46.861971Z  WARN databend_query::pipelines::executor::processor_async_task: processor_async_task.rs:146 Very slow processor async task, query_id:"64f993dc-1f5c-4ad9-8e34-6c71edcd5cbf", processor id: NodeIndex(18), name: "HashJoinProbe", elapsed: 5.001251878s, active sync workers: 0
64f993dc-1f5c-4ad9-8e34-6c71edcd5cbf 2024-08-16T14:39:46.862026Z  WARN databend_query::pipelines::executor::processor_async_task: processor_async_task.rs:146 Very slow processor async task, query_id:"64f993dc-1f5c-4ad9-8e34-6c71edcd5cbf", processor id: NodeIndex(19), name: "HashJoinProbe", elapsed: 5.001248298s, active sync workers: 0
64f993dc-1f5c-4ad9-8e34-6c71edcd5cbf 2024-08-16T14:39:46.861813Z  WARN databend_query::pipelines::executor::processor_async_task: processor_async_task.rs:146 Very slow processor async task, query_id:"64f993dc-1f5c-4ad9-8e34-6c71edcd5cbf", processor id: NodeIndex(21), name: "HashJoinProbe", elapsed: 5.000742418s, active sync workers: 0
64f993dc-1f5c-4ad9-8e34-6c71edcd5cbf 2024-08-16T14:39:46.861836Z  WARN databend_query::pipelines::executor::processor_async_task: processor_async_task.rs:146 Very slow processor async task, query_id:"64f993dc-1f5c-4ad9-8e34-6c71edcd5cbf", processor id: NodeIndex(20), name: "HashJoinProbe", elapsed: 5.000902398s, active sync workers: 0
64f993dc-1f5c-4ad9-8e34-6c71edcd5cbf 2024-08-16T14:39:46.861919Z  WARN databend_query::pipelines::executor::processor_async_task: processor_async_task.rs:146 Very slow processor async task, query_id:"64f993dc-1f5c-4ad9-8e34-6c71edcd5cbf", processor id: NodeIndex(22), name: "HashJoinProbe", elapsed: 5.000974968s, active sync workers: 0
panicked at src/query/service/src/pipelines/processors/transforms/hash_join/probe_join/inner_join.rs:75:62:
range end index 111938 out of range for slice of length 65536

64f993dc-1f5c-4ad9-8e34-6c71edcd5cbf 2024-08-16T14:39:48.003454Z ERROR databend_common_tracing::panic_hook: panic_hook.rs:55 panicked at src/query/service/src/pipelines/processors/transforms/hash_join/probe_join/inner_join.rs:75:62:
range end index 111938 out of range for slice of length 65536 backtrace= panic.file=src/query/service/src/pipelines/processors/transforms/hash_join/probe_join/inner_join.rs panic.line=75 panic.column=62
############################### Crash fault info ###############################
PID: 1
TID: 44
Version: v1.2.609-8467049b4e(rust-1.81.0-nightly-2024-08-13T22:30:34.648068594Z)
Timestamp(UTC): 2024-08-16 14:39:48.079422307 UTC
Timestamp(Local): 2024-08-16 14:39:48.080645657 +00:00
QueryId: "Unknown"
Signal 11 (SIGSEGV), si_code 128 (Unknown), Address null points

Backtrace:
2024-08-16T14:39:48.082805637Z
2024-08-16T14:40:19.645209Z  WARN databend_query::servers::admin::admin_service: admin_service.rs:166 Http API TLS not set
Databend Query

Version: v1.2.609-8467049b4e(rust-1.81.0-nightly-2024-08-13T22:30:34.648068594Z)

The EXPLAIN:

explain                                                                                                                                                                                                                                                        |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
InsertPlan (subquery):                                                                                                                                                                                                                                         |
├── table: default.anlz9bd0dabb-7e23-4562-84eb-5a42546fccb1.analyzetable_9a1eeded-3546-427e-a230-f48f6ef04c0e                                                                                                                                                  |
├── inserted columns: [analyzetable_9a1eeded-3546-427e-a230-f48f6ef04c0e.run_id (#0),analyzetable_9a1eeded-3546-427e-a230-f48f6ef04c0e.run_day (#1),analyzetable_9a1eeded-3546-427e-a230-f48f6ef04c0e.run_sequence (#2),analyzetable_9a1eeded-3546-427e-a230-f4|
├── overwrite: false                                                                                                                                                                                                                                           |
└── Exchange(Merge)                                                                                                                                                                                                                                            |
    └── EvalScalar                                                                                                                                                                                                                                             |
        ├── scalars: [CAST(AA.run_id (#0) AS String NULL) AS (#97), CAST(AA.run_day (#1) AS String NULL) AS (#98), CAST(AA.run_sequence (#2) AS String NULL) AS (#99), CAST(AA.timestamp (#3) AS String NULL) AS (#100), CAST(AA.CHARGE_TYPE (#4) AS String NUL|
        └── Join(Inner)                                                                                                                                                                                                                                        |
            ├── build keys: [AA.CHARGE_TYPE (#4), AA.PERIOD (#5), AA.SENDER__COST_CENTER (#18), AA.SENDER__ENTITY (#20), AA.run_day (#1), AA.run_id (#0), AA.timestamp (#3)]                                                                                   |
            ├── probe keys: [SC_1a.CHARGE_TYPE (#52), SC_1a.PERIOD (#72), SC_1a.COST_CENTER (#53), SC_1a.ENTITY (#58), SC_1a.run_day (#92), SC_1a.run_id (#93), SC_1a.timestamp (#96)]                                                                         |
            ├── other filters: []                                                                                                                                                                                                                              |
            ├── Scan                                                                                                                                                                                                                                           |
            │   ├── table: anlz9bd0dabb-7e23-4562-84eb-5a42546fccb1.analyzetable_bca425c0d64548fd920c0c593cbc6ce4                                                                                                                                              |
            │   ├── filters: []                                                                                                                                                                                                                                |
            │   ├── order by: []                                                                                                                                                                                                                               |
            │   └── limit: NONE                                                                                                                                                                                                                                |
            └── Exchange(Broadcast)                                                                                                                                                                                                                            |
                └── Scan                                                                                                                                                                                                                                       |
                    ├── table: anlz9bd0dabb-7e23-4562-84eb-5a42546fccb1.analyzetable_edef4c75280e488caf07bdb76c0c1c13                                                                                                                                          |
                    ├── filters: []                                                                                                                                                                                                                            |
                    ├── order by: []                                                                                                                                                                                                                           |
                    └── limit: NONE                                                                                                                                                                                                                            |

Crash Fault Info

############################### Crash fault info ###############################
PID: 1
TID: 106
Version: v1.2.615-nightly-a8da519a63(rust-1.81.0-nightly-2024-08-18T03:44:10.590292383Z)
Timestamp(UTC): 2024-08-19 09:17:06.627602738 UTC
Timestamp(Local): 2024-08-19 09:17:06.627629928 +00:00
QueryId: "c3366d9b-8d48-4835-aa93-dbcdc4d23d79"
Signal 11 (SIGSEGV), si_code 1 (Unknown), Address 0x88
Backtrace:
c3366d9b-8d48-4835-aa93-dbcdc4d23d79 2024-08-19T09:17:07.224900Z  WARN databend_query::servers::flight::v1::exchange::statistics_sender: statistics_sender.rs:133 Cannot send data via flight exchange, cause: SendError(..)

We have also set join_spilling_memory_ratio = 0 and join_spilling_memory_ratio = 1 but it seems to have no impact on this error.

Other Settings info:

name                                              |value               |default             |range                                                              |level  |description                                                                                                                                                                               |type  |
--------------------------------------------------+--------------------+--------------------+-------------------------------------------------------------------+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+
acquire_lock_timeout                              |30                  |30                  |[0, 18446744073709551615]                                          |DEFAULT|Sets the maximum timeout in seconds for acquire a lock.                                                                                                                                   |UInt64|
aggregate_spilling_bytes_threshold_per_proc       |0                   |0                   |[0, 18446744073709551615]                                          |DEFAULT|Sets the maximum amount of memory in bytes that an aggregator can use before spilling data to storage during query execution.                                                             |UInt64|
aggregate_spilling_memory_ratio                   |60                  |60                  |[0, 100]                                                           |DEFAULT|Sets the maximum memory ratio in bytes that an aggregator can use before spilling data to storage during query execution.                                                                 |UInt64|
auto_compaction_imperfect_blocks_threshold        |25                  |25                  |[0, 18446744073709551615]                                          |DEFAULT|Threshold for triggering auto compaction. This occurs when the number of imperfect blocks in a snapshot exceeds this value after write operations.                                        |UInt64|
collation                                         |utf8                |utf8                |["utf8"]                                                           |LOCAL  |Sets the character collation. Available values include "utf8".                                                                                                                            |String|
compact_max_block_selection                       |10000               |10000               |[2, 18446744073709551615]                                          |DEFAULT|Limits the maximum number of blocks that can be selected during a compact operation.                                                                                                      |UInt64|
cost_factor_aggregate_per_row                     |5                   |5                   |[0, 18446744073709551615]                                          |DEFAULT|Cost factor of grouping operation for a data row                                                                                                                                          |UInt64|
cost_factor_hash_table_per_row                    |10                  |10                  |[0, 18446744073709551615]                                          |DEFAULT|Cost factor of building hash table for a data row                                                                                                                                         |UInt64|
cost_factor_network_per_row                       |50                  |50                  |[0, 18446744073709551615]                                          |DEFAULT|Cost factor of transmit via network for a data row                                                                                                                                        |UInt64|
create_query_flight_client_with_current_rt        |1                   |1                   |[0, 1]                                                             |DEFAULT|Turns on (1) or off (0) the use of the current runtime for query operations.                                                                                                              |UInt64|
data_retention_time_in_days                       |30                  |1                   |[0, 90]                                                            |LOCAL  |Sets the data retention time in days.                                                                                                                                                     |UInt64|
ddl_column_type_nullable                          |1                   |1                   |[0, 1]                                                             |DEFAULT|Sets new columns to be nullable (1) or not (0) by default in table operations.                                                                                                            |UInt64|
disable_join_reorder                              |0                   |0                   |[0, 1]                                                             |DEFAULT|Disable join reorder optimization.                                                                                                                                                        |UInt64|
disable_variant_check                             |0                   |0                   |[0, 1]                                                             |DEFAULT|Disable variant check to allow insert invalid JSON values                                                                                                                                 |UInt64|
efficiently_memory_group_by                       |0                   |0                   |[0, 1]                                                             |DEFAULT|Memory is used efficiently, but this may cause performance degradation.                                                                                                                   |UInt64|
enable_aggregating_index_scan                     |1                   |1                   |[0, 1]                                                             |DEFAULT|Enables scanning aggregating index data while querying.                                                                                                                                   |UInt64|
enable_analyze_histogram                          |0                   |0                   |[0, 1]                                                             |DEFAULT|Enables analyze histogram for query optimization during analyzing table.                                                                                                                  |UInt64|
enable_auto_fix_missing_bloom_index               |0                   |0                   |[0, 1]                                                             |DEFAULT|Enables auto fix missing bloom index                                                                                                                                                      |UInt64|
enable_bloom_runtime_filter                       |1                   |1                   |[0, 1]                                                             |DEFAULT|Enables runtime filter optimization for JOIN.                                                                                                                                             |UInt64|
enable_cbo                                        |1                   |1                   |[0, 1]                                                             |DEFAULT|Enables cost-based optimization.                                                                                                                                                          |UInt64|
enable_clickhouse_handler                         |0                   |0                   |[0, 1]                                                             |DEFAULT|Enables clickhouse handler.                                                                                                                                                               |UInt64|
enable_compact_after_write                        |1                   |1                   |[0, 1]                                                             |DEFAULT|Enables compact after write(copy/insert/replace-into/merge-into), need more memory.                                                                                                       |UInt64|
enable_distributed_compact                        |1                   |0                   |[0, 1]                                                             |LOCAL  |Enables distributed execution of table compaction.                                                                                                                                        |UInt64|
enable_distributed_copy_into                      |1                   |1                   |[0, 1]                                                             |LOCAL  |Enables distributed execution for the 'COPY INTO'.                                                                                                                                        |UInt64|
enable_distributed_merge_into                     |1                   |1                   |[0, 1]                                                             |LOCAL  |Enables distributed execution for 'MERGE INTO'.                                                                                                                                           |UInt64|
enable_distributed_recluster                      |1                   |0                   |[0, 1]                                                             |LOCAL  |Enable distributed execution of table recluster.                                                                                                                                          |UInt64|
enable_distributed_replace_into                   |1                   |0                   |[0, 1]                                                             |LOCAL  |Enables distributed execution of 'REPLACE INTO'.                                                                                                                                          |UInt64|
enable_dphyp                                      |1                   |1                   |[0, 1]                                                             |DEFAULT|Enables dphyp join order algorithm.                                                                                                                                                       |UInt64|
enable_dst_hour_fix                               |0                   |0                   |[0, 1]                                                             |DEFAULT|Time conversion handles invalid DST by adding an hour. Accuracy not guaranteed.(disable by default)                                                                                       |UInt64|
enable_experimental_aggregate_hashtable           |1                   |1                   |[0, 1]                                                             |DEFAULT|Enables experimental aggregate hashtable                                                                                                                                                  |UInt64|
enable_experimental_merge_into                    |1                   |1                   |[0, 1]                                                             |DEFAULT|Enables the experimental feature for 'MERGE INTO'.                                                                                                                                        |UInt64|
enable_experimental_queries_executor              |0                   |0                   |[0, 1]                                                             |DEFAULT|Enables experimental new executor                                                                                                                                                         |UInt64|
enable_experimental_rbac_check                    |1                   |1                   |[0, 1]                                                             |DEFAULT|experiment setting disables stage and udf privilege check(enable by default).                                                                                                             |UInt64|
enable_geo_create_table               ...

How to Reproduce?

Inner Join with several join keys.

Are you willing to submit PR?

inviscid commented 3 weeks ago

@BohuTANG @Dousir9 - From our Slack chat

inviscid commented 3 weeks ago

We would be happy to provide the data for two tables if that would help.

Dousir9 commented 3 weeks ago

@inviscid Thank you very much for providing the data, could you send me the data on slack :)

sundy-li commented 3 weeks ago

Other Settings info:

Next time, you just need to use show settings where value != default

inviscid commented 3 weeks ago

@Dousir9 I DM'd you the two table datasets in Slack