confluentinc / ksql

The database purpose-built for stream processing applications.
https://ksqldb.io
Other
114 stars 1.04k forks source link

`ksql> Failed to prepare statement: EN_WIKIPEDIA_GT_1_STREAM does not exist.` #3363

Closed ybyzek closed 4 years ago

ybyzek commented 5 years ago

Two users have reported an error running cp-demo at https://github.com/confluentinc/cp-demo (most users do not report this issue, so it's an interesting/intermittent timing issue).

Run KSQL queries (~10 seconds):
Aug 30, 2019 1:59:13 PM org.jline.utils.Log logr
WARNING: Unable to create a system terminal, creating a dumb terminal (enable debug logging for more information)
KSQL, Copyright 2017-2018 Confluent Inc.

CLI v5.3.0, Server v5.3.0 located at http://ksql-server:8088

Having trouble? Type 'help' (case-insensitive) for a rundown of how things work!

ksql> Failed to prepare statement: EN_WIKIPEDIA_GT_1_STREAM does not exist.
ksql> Exiting KSQL.

Per the users who have reported this issue:

  1. There are no errors beyond the one shown above
  2. If user re-executes the command manually (the 3rd), it passes

Note that I personally cannot reproduce this issue. I've seen reports only from other users including @remiforest and @daniellavoie

ybyzek commented 4 years ago

I’m now hitting this issue occasionally as well, could someone please help look into this?

Side note: in my environment, cp-demo on a release branch usually runs smoothly. It is only running cp-demo with PR https://github.com/confluentinc/cp-demo/pull/133 that exposes this issue in my environment. (The only difference is security store type, which is interesting why that would change timing?) When @remiforest and @daniellavoie reported the issue, it was on a release branch.

rspurgeon commented 4 years ago

@ybyzek i'm investigating this

rspurgeon commented 4 years ago

I was able to reproduce this issue (on the cp-demo DEVX-1032 branch) using Control Center and KSQL independent of the cp-demo automation. I modified cp-demo to stop short of the ksql script execution, and then attempted to run the series of KSQL commands in Control Center.

The script looks like this:

CREATE STREAM wikipedia (CREATEDAT bigint, WIKIPAGE string, CHANNEL string, USERNAME string, COMMITMESSAGE string, BYTECHANGE integer, DIFFURL string, ISNEW boolean, ISMINOR boolean, ISBOT boolean, ISUNPATROLLED boolean) WITH (kafka_topic='wikipedia.parsed', value_format='AVRO');
CREATE STREAM wikipedianobot WITH (PARTITIONS=2,REPLICAS=2) AS SELECT * FROM wikipedia WHERE isbot <> true;
CREATE STREAM wikipediabot WITH (PARTITIONS=2,REPLICAS=2) AS SELECT * FROM wikipedia WHERE isbot = true;
CREATE TABLE en_wikipedia_gt_1 WITH (PARTITIONS=2,REPLICAS=2) AS SELECT username, wikipage, count(*) AS COUNT FROM wikipedia WINDOW TUMBLING (size 300 second) WHERE channel = '#en.wikipedia' GROUP BY username, wikipage HAVING count(*) > 1; 
CREATE STREAM en_wikipedia_gt_1_stream (USERNAME string, WIKIPAGE string, COUNT bigint) WITH (kafka_topic='EN_WIKIPEDIA_GT_1', value_format='AVRO');
CREATE STREAM en_wikipedia_gt_1_counts WITH (PARTITIONS=2,REPLICAS=2) AS SELECT * FROM en_wikipedia_gt_1_stream where ROWTIME is not null;

The final CREATE STREAM is the one that fails, and is the only CREATE derived from a previously create stream in the script itself.

After a short period, I copied the final line out and ran it independent of the rest of the script and the final CREATE STREAM succeeded.

NOTE: This issue is not apparent on 5.3.1-post or 5.3.0-post branch, however it is on the DEVX-1032 'feature' branch we are working on. Here is the diff between the two branches related to security key changes for the services:

diff --git a/scripts/security/certs-create.sh b/scripts/security/certs-create.sh
index 8e7abf5..bc6318b 100755
--- a/scripts/security/certs-create.sh
+++ b/scripts/security/certs-create.sh
@@ -9,7 +9,7 @@
 rm -f *.crt *.csr *_creds *.jks *.srl *.key *.pem *.der *.p12

 # Generate CA key
-openssl req -new -x509 -keyout snakeoil-ca-1.key -out snakeoil-ca-1.crt -days 365 -subj '/CN=ca1.test.confluent.io/OU=TEST/O=CONFLUENT/L=PaloAlto/S=Ca/C=US' -passin pass:confluent -passout pass:confluent
+openssl req -new -x509 -keyout snakeoil-ca-1.key -out snakeoil-ca-1.crt -days 365 -subj '/CN=ca1.test.confluent.io/OU=TEST/O=CONFLUENT/L=PaloAlto/ST=Ca/C=US' -passin pass:confluent -passout pass:confluent

 for i in kafka1 kafka2 client schemaregistry restproxy connect control-center
 do
@@ -23,7 +23,8 @@ do
                 -keystore kafka.$i.keystore.jks \
                 -keyalg RSA \
                 -storepass confluent \
-                -keypass confluent
+                -keypass confluent \
+                -storetype pkcs12

    # Create the certificate signing request (CSR)
    keytool -keystore kafka.$i.keystore.jks -alias $i -certreq -file $i.csr -storepass confluent -keypass confluent -ext "SAN=dns:$i,dns:localhost"
ybyzek commented 4 years ago

@big-andy-coates update: this KSQL issue is now a blocker to merging https://github.com/confluentinc/cp-demo/pull/133 into cp-demo.

rspurgeon commented 4 years ago

I was able to circumvent this issue in cp-demo by adding a 'dummy command' to the KSQL script prior to the create stream from the previously created stream, as:

...
CREATE STREAM en_wikipedia_gt_1_stream (USERNAME string, WIKIPAGE string, COUNT bigint) WITH (kafka_topic='EN_WIKIPEDIA_GT_1', value_format='AVRO');
DROP STREAM IF EXISTS FOO;
CREATE STREAM en_wikipedia_gt_1_counts WITH (PARTITIONS=2,REPLICAS=2) AS SELECT * FROM en_wikipedia_gt_1_stream where ROWTIME is not null;

DROP STREAM IF EXISTS FOO; existing just to delay the final CREATE...

ybyzek commented 4 years ago

From @big-andy-coates

I've debugged the Ksql server when running multiple statements in C3 and it sends all the statements across in a single request. This is the correct behaviour for C3 as then it doesn't need to worry about sequence numbers etc, leaving the KSQL server to ensure previous commands have been executed for processing the next.

The same multi-line requests can be achieved in the CLI by putting them all on the one line, or by using the cli's RUN SCRIPT functionality, where the CLI will load the content of the script and pass the lot over to the server in a single request.

So, it sounds like there is a bug in the way KSQL server is handling multi-statement requests.

apurvam commented 4 years ago

Assigning to @big-andy-coates since he has been looking at this.

big-andy-coates commented 4 years ago

Issue exists in 5.4.x branch. Confirmed its also in 5.3.x. However, it's fixed in master.

The issue related to how the KSQL server is processing multi-statement requests. There is a bug that means that a subsequent command may be 'prepared' before the previous command has been executed on the command runner thread.

Master has changes that mean the prepare call doesn't validate that sources exist, (which is correct behavior). In 5.4.x and before the prepare call checks the metastore for any referenced sources and throws an exception if they are not known.

FYI, the bug has little to do with https://github.com/confluentinc/cp-demo/pull/133. This is purely a bug in the server with how it handles multi-statement request.