confluentinc / ksql

The database purpose-built for stream processing applications.
https://ksqldb.io
Other
88 stars 1.04k forks source link

Empty answers to queries with existing data in materialized views (tables) #8255

Open EdisonCalderon opened 2 years ago

EdisonCalderon commented 2 years ago

Hi everyone,

Firstly thank you for that great and interesting product you all have built here. I am using 0.18 version of KSQL Server in Docker with the official image of Docker Hub and these days I updated to 0.21 version trying to see if the problem I have, without success.

I am using some materialized views and I have noticed that some queries throw me a false positive, noting that they do not find data in records that I know with certainty that they have data.

It should be clarified that it does not happen in all cases but if in several, and that no error log is seen in KSQL; I have also tested with 1 and more replicas, with active Heartbeat and off but I have not seen a plausible solution.

The table I'm querying is:

CREATE TABLE LIA_QA_FIF_CORP_BALANCER_TME_MINUTE WITH (FORMAT='JSON') AS
SELECT business, group_id, minuto, EXTRACTJSONFIELD(detail,'$.queue_type') as queue_type, AVG((end_date - start_date)/CAST(60000 AS DOUBLE)) as tme
FROM LIA_QA_FIF_CORP_BALANCER_TIME
WHERE event = 'GO_AGENT'
GROUP BY business, group_id, EXTRACTJSONFIELD(detail,'$.queue_type'), minuto;

And I'm issuing queries like this:

SELECT tme FROM LIA_QA_FIF_CORP_BALANCER_TME_MINUTE WHERE BUSINESS = 'banco_co' AND GROUP_ID = 60 AND QUEUE_TYPE = 'G' AND MINUTO = '2021-10-13 01:20';

imagen

I noticed that using the configuration ksql.query.pull.table.scan.enabled=true and modifying the query (to force scan) it can retrieve the correct value that is present in topic associeted with the table.

SELECT tme FROM LIA_QA_FIF_CORP_BALANCER_TME_MINUTE WHERE BUSINESS = 'banco_co' AND GROUP_ID = 60 AND QUEUE_TYPE = 'G' AND MINUTO = '2021-10-13 01:20' AND QUEUE_TYPE=QUEUE_TYPE;

imagen

With all this I wonder if I may have an error in configuration or if it can be a KSQL bug. I hope I have been explanatory and that someone can help me. Again thanks for the help you can give me.

EdisonCalderon commented 2 years ago

Hello, I make a small update to indicate that I have another pair of tables, with their respective queries, on which I have no problem with queries.

This tables are:

CREATE TABLE LIA_QA_FIF_CORP_BALANCER_SERVERS_MINUTE WITH (FORMAT='JSON') AS
SELECT business, group_id, minuto, AVG(CAST(EXTRACTJSONFIELD(detail,'$.servers') AS INT)) as servidores
FROM LIA_QA_FIF_CORP_BALANCER_TIME
WHERE event = 'SERVERS'
GROUP BY business, group_id, minuto;

And

CREATE TABLE LIA_QA_FIF_CORP_PARLO_TMO_MINUTE WITH (FORMAT='JSON') AS
SELECT business, group_id, minuto, AVG((end_date - start_date)/CAST(60000 AS DOUBLE)) as tmo
FROM LIA_QA_FIF_CORP_PARLO_TIME
WHERE event = 'ENDED_CONVERSATION'
GROUP BY business, group_id, minuto;

I suspect that the problem is associated with the GROUP BY sentencem in EXTRACTJSONFIELD(detail,'$.queue_type'), but it is merely a suspicion