confluentinc / ksql

The database purpose-built for stream processing applications.
https://ksqldb.io
Other
80 stars 1.04k forks source link

How to enable ONLY_FULL_GROUP_BY in ksql #6656

Open ratulbasak opened 3 years ago

ratulbasak commented 3 years ago

Is there any option or settings in ksql to enable ONLY_FULL_GROUP_BY?

We have multiple tables to join using group by clause with multiple nullable columns. Is this possible?

rodesai commented 3 years ago

We do only support full group bys in the sense that you cannot select columns that are not present in the group by clause. Not sure where nullability comes into play. Can you explain specifically the semantics you would like to see?

ratulbasak commented 3 years ago

Thank you for your reply.

As we have to create a stream using join between two-stream and there is one aggregate operation and other select fields are flat. The problem is when we group by with multiple columns if any column has a null value it returns the output stream with blank rows.

The following queries we are using for creating a stream using join -

Fiirstly, we created stream from users topic:

create stream json_users ("id" integer,"username" string,"password" string,"last_login" string,"is_superuser" integer,"is_staff" integer,"is_active" integer,"first_name" string,"last_name" string,"date_joined" string,"birth_date" string,"created_by_id" string,"modified_by_id" string,"created_at" string,"modified_at" string,"verified" integer,"otp_token" string,"profile_pic_url" string,"gender" string,"user_status" string,"devices" string,"facebook" string,"google" string,"email" string,"contact" string,"address" string,"user_type" string) with (kafka_topic='dbserver2.evaly_auth_db.users', value_format='JSON');

Second stream:

create stream user_groups as select ag."id",ug."user_id",ag."name" from json_users_groups ug join AUTH_GROUP_TABLE ag on ug."group_id"=ag."id" emit changes;

Lastly, there is a join between streams

create stream select 
        u."id",
        u."username",
        u."password",
        u."last_login",
        u."is_superuser",
        u."is_staff",
        u."is_active",
        u."first_name",
        u."last_name",
        u."date_joined",
        u."birth_date",
        u."created_by_id",
        u."created_at",
        u."modified_at",
        u."verified",
        u."profile_pic_url",
        u."gender",
        u."user_status",
        u."email",
        u."contact",
        u."address",
        u."user_type",
        u."otp_token",
        COLLECT_LIST(ug."name") as "groups" 
    from user_groups ug 
    join json_users u within 2 hours on ug."user_id" = u."id" 
    group by 
        u."id",
        u."username",
        u."password",
        u."last_login",
        u."is_superuser",
        u."is_staff",
        u."is_active",
        u."first_name",
        u."last_name",
        u."date_joined",
        u."birth_date",
        u."created_by_id",
        u."created_at",
        u."modified_at",
        u."verified",
        u."profile_pic_url",
        u."gender",
        u."user_status",
        u."email",
        u."contact",
        u."address",
        u."user_type",
        u."otp_token"
    emit changes;

There are some nullable columns in json_users stream which we need to propagate into the target database.