confluentinc / ksql

The database purpose-built for stream processing applications.
https://ksqldb.io
Other
124 stars 1.04k forks source link

headless ksqlDB Server fails to prepare multi-statement sql script #6892

Open teichmaj opened 3 years ago

teichmaj commented 3 years ago

Describe the bug We are running the ksqldb server in lockdown headless mode and provide the path to a .sql script via the environment variable KSQL_KSQL_QUERIES_FILE The .sql script contains two statements. The first statement defines a new stream on top of a Kafka topic which the second statement references. The second statement fails to validate because the stream defined in the first statement cannot be found. The problem sounds very similar to this: https://github.com/confluentinc/ksql/pull/3952

To Reproduce KSQL version: 0.14.0 script.sql content:

CREATE OR REPLACE STREAM `some_new_stream1` WITH (KAFKA_TOPIC = 'some.kafka.topic1', VALUE_FORMAT = 'AVRO');

CREATE OR REPLACE STREAM `some_new_staream2` WITH (KAFKA_TOPIC = 'some.kafka.topic2')
AS SELECT 
  field1 AS field1
FROM `some_new_stream1`

Expected behavior Two new streams are created sequentially with the second stream consuming from the first stream.

Actual behaviour

[2021-01-25 15:14:40,170] ERROR Failed to start KSQL (io.confluent.ksql.rest.server.KsqlServerMain:68)
--
io.confluent.ksql.util.KsqlStatementException: Exception while preparing statement: some_new_stream1 does not exist.
at io.confluent.ksql.engine.EngineContext.prepare(EngineContext.java:183)
--
at io.confluent.ksql.engine.SandboxedExecutionContext.prepare(SandboxedExecutionContext.java:99)
at io.confluent.ksql.KsqlExecutionContext.prepare(KsqlExecutionContext.java:116)
at io.confluent.ksql.rest.server.StandaloneExecutor$StatementExecutor.prepare(StandaloneExecutor.java:320)
at io.confluent.ksql.rest.server.StandaloneExecutor$StatementExecutor.execute(StandaloneExecutor.java:300)
at io.confluent.ksql.rest.server.StandaloneExecutor.executeStatements(StandaloneExecutor.java:218)
at io.confluent.ksql.rest.server.StandaloneExecutor.validateStatements(StandaloneExecutor.java:202)
at io.confluent.ksql.rest.server.StandaloneExecutor.processesQueryFile(StandaloneExecutor.java:180)
at io.confluent.ksql.rest.server.StandaloneExecutor.startAsync(StandaloneExecutor.java:122)
at io.confluent.ksql.rest.server.KsqlServerMain.tryStartApp(KsqlServerMain.java:91)
at io.confluent.ksql.rest.server.KsqlServerMain.main(KsqlServerMain.java:66)
Caused by: io.confluent.ksql.util.KsqlException: some_new_stream1 does not exist.
at io.confluent.ksql.engine.rewrite.DataSourceExtractor$Visitor.visitAliasedRelation(DataSourceExtractor.java:110)
at io.confluent.ksql.engine.rewrite.DataSourceExtractor$Visitor.visitAliasedRelation(DataSourceExtractor.java:99)
at io.confluent.ksql.parser.tree.AliasedRelation.accept(AliasedRelation.java:60)
at io.confluent.ksql.parser.tree.AstVisitor.process(AstVisitor.java:34)
at io.confluent.ksql.parser.DefaultTraversalVisitor.visitQuery(DefaultTraversalVisitor.java:46)
at io.confluent.ksql.parser.tree.Query.accept(Query.java:118)
at io.confluent.ksql.parser.tree.AstVisitor.process(AstVisitor.java:34)
at io.confluent.ksql.parser.DefaultTraversalVisitor.visitCreateStreamAsSelect(DefaultTraversalVisitor.java:105)
at io.confluent.ksql.parser.tree.CreateStreamAsSelect.accept(CreateStreamAsSelect.java:61)
at io.confluent.ksql.parser.tree.AstVisitor.process(AstVisitor.java:34)
at io.confluent.ksql.engine.rewrite.DataSourceExtractor.extractDataSources(DataSourceExtractor.java:55)
at io.confluent.ksql.engine.rewrite.AstSanitizer.sanitize(AstSanitizer.java:66)
at io.confluent.ksql.engine.EngineContext.prepare(EngineContext.java:177)
... 10 more

Additional context The statements themselves are working fine when submitted individually one at the time to the ksqldb server in interactive mode via the API. The problem only occurs when trying to run the same statements but supply them to headless ksqldb server via a statement sql script.

mjsax commented 3 years ago

I tried to reproduce this issue using 0.14.0 docker image, and this are my observations: If I use the exact statement from above, the first statement fails already (what I kinda expected):

io.confluent.ksql.util.KsqlStatementException:
  Statement is missing the 'VALUE_FORMAT' property from the WITH clause.
  Either provide one or set a default via the 'ksql.persistence.default.format.value' config.

Btw: just adding a value_format may or may not be sufficient, depending on the used format. If you integrate with Schema Registry, value_format might be sufficient; otherwise, you would need to declare the schema of you stream explicitly.

Because the first statement cannot create the stream, the second statement fails of course, too.

If I change the first statement to

CREATE OR REPLACE STREAM `some_new_stream1` (field1 VARCHAR)
    WITH (KAFKA_TOPIC = 'some.kafka.topic1', VALUE_FORMAT='json');

the stream is created and the second statement is also executed successfully.

teichmaj commented 3 years ago

Hi Matthias, thank you for looking into this. We are using the Schema Registry and we are actually providing value_format='avro' It's my bad that I forgot to include it in the example in the first comment.

The single statements work if we send them individually to the REST API of KSQL running in interactive mode. That's why we are fairly confident that the queries themselves are correct. In the logs we also do not see any exception for the first statement which creates the first stream over our existing kafka topic. The exception occurs when preparing the second statement as part of the same sql script.

mjsax commented 3 years ago

Could it be that you hit https://github.com/confluentinc/ksql/issues/6725 -- can you try to set TRACE level logging?