apache / doris

Apache Doris is an easy-to-use, high performance and unified analytics database.
https://doris.apache.org
Apache License 2.0
12.81k stars 3.3k forks source link

[Bug] Group commit doesn't work on Doris 3.0 compute storage decoupled mode #39511

Open geoffreytran opened 3 months ago

geoffreytran commented 3 months ago

Search before asking

Version

3.0

What's Wrong?

When using Doris 3.0 in compute storage decoupled mode, group commit functionality does not appear to work properly. Not sure if this is unimplemented yet or a bug.

https://doris.apache.org/docs/data-operate/import/group-commit-manual/

What You Expected?

That group commit is handled properly.

How to Reproduce?

No response

Anything Else?

No response

Are you willing to submit PR?

Code of Conduct

Yukang-Lian commented 2 months ago

Can you provide more detailed information? Like how you use group commit?

geoffreytran commented 2 months ago

Sure, group commit is currently used for aggregating inserts on a unique key table in group commit sync mode. It currently works on Doris <=2.1, but does not appear to be working under the 3.0 storage decoupled mode. I've not had a chance yet to test with 3.0.1 which looks like there are fixes related to group commit.

create table if not exists test_table (
    workspace_id int not null comment "Workspace id",
    anonymous_id varchar(64) comment "Anonymous id",
    user_id varchar(64) comment "User id",
    tenant_id int not null comment "Tenant id",
    created_at datetime not null default current_timestamp(0) comment "Created at"
)
engine=olap
unique key (workspace_id, anonymous_id, user_id)
distributed by hash(workspace_id, anonymous_id)
properties (
    "replication_allocation" = "tag.location.default: 3",
    "bloom_filter_columns" = "anonymous_id, user_id",
    "enable_unique_key_merge_on_write" = "true",
    "store_row_column" = "true",
    "light_schema_change" = "true",
    "group_commit_interval_ms" = "1000"
);
set enable_insert_strict = true, group_commit = sync_mode;

insert into test_table (workspace_id, anonymous_id, user_id, tenant_id, created_at) values (1, 'test', 'test', 1, '2023-01-01 00:00:00');
insert into test_table (workspace_id, anonymous_id, user_id, tenant_id, created_at) values (1, 'test', 'test', 1, '2023-01-01 00:00:00');
insert into test_table (workspace_id, anonymous_id, user_id, tenant_id, created_at) values (1, 'test', 'test', 1, '2023-01-01 00:00:00');