confluentinc / examples

Apache Kafka and Confluent Platform examples and demos
Apache License 2.0
1.9k stars 1.11k forks source link

Errors in cluster deployment when running cp-quickstart #1166

Closed sw00t closed 1 year ago

sw00t commented 1 year ago

Description Deployments to Azure when running cp-quickstart/start-cloud.sh completes with errors, with ksqlDB streams and tables not being able to be deployed. Errors state certain topics don't exist, format not correct, and/or access to the schema registry isn't functional, despite topics existing, formats appear correct, and access to SR seemingly okay.

Errors:

Submitting KSQL queries via curl to the ksqlDB REST endpoint
    See https://docs.ksqldb.io/en/latest/developer-guide/api/ for more information
✘ CREATE STREAM pageviews WITH (kafka_topic='pageviews', value_format='AVRO');
    "Schema for message values on topic 'pageviews' does not exist in the Schema Registry.nSubject: pageviews-valuenPossible causes include:n- The topic itself does not existnt-> Use SHOW TOPICS; to checkn- Messages on the topic are not serialized using a format Schema Registry supportsnt-> Use PRINT 'pageviews' FROM BEGINNING; to verifyn- Messages on the topic have not been serialized using a Confluent Schema Registry supported serializernt-> See https://docs.confluent.io/current/schema-registry/docs/serializer-formatter.htmln- The schema is registered on a different instance of the Schema Registrynt-> Use the REST API to list available subjectsthttps://docs.confluent.io/current/schema-registry/docs/api.html#get--subjectsn- You do not have permissions to access the Schema Registry.nt-> See https://docs.confluent.io/current/schema-registry/docs/security.html"
✘ CREATE TABLE users (id STRING PRIMARY KEY) WITH (kafka_topic='users', value_format='PROTOBUF');
    "Schema for message values on topic 'users' does not exist in the Schema Registry.nSubject: users-valuenPossible causes include:n- The topic itself does not existnt-> Use SHOW TOPICS; to checkn- Messages on the topic are not serialized using a format Schema Registry supportsnt-> Use PRINT 'users' FROM BEGINNING; to verifyn- Messages on the topic have not been serialized using a Confluent Schema Registry supported serializernt-> See https://docs.confluent.io/current/schema-registry/docs/serializer-formatter.htmln- The schema is registered on a different instance of the Schema Registrynt-> Use the REST API to list available subjectsthttps://docs.confluent.io/current/schema-registry/docs/api.html#get--subjectsn- You do not have permissions to access the Schema Registry.nt-> See https://docs.confluent.io/current/schema-registry/docs/security.html"
✘ CREATE STREAM pageviews_female AS SELECT users.id AS userid, pageid, regionid, gender FROM pageviews LEFT JOIN users ON pageviews.userid = users.id WHERE gender = 'FEMALE';
    "Exception while preparing statement: PAGEVIEWS does not exist."
✘ CREATE STREAM pageviews_female_like_89 AS SELECT * FROM pageviews_female WHERE regionid LIKE '%_8' OR regionid LIKE '%_9';
    "Exception while preparing statement: PAGEVIEWS_FEMALE does not exist."
✘ CREATE TABLE pageviews_regions WITH (key_format='JSON') AS SELECT gender, regionid , COUNT(*) AS numusers FROM pageviews_female WINDOW TUMBLING (size 30 second) GROUP BY gender, regionid HAVING COUNT(*) > 1;
    "Exception while preparing statement: PAGEVIEWS_FEMALE does not exist."
✘ CREATE STREAM accomplished_female_readers WITH (value_format='JSON_SR') AS SELECT * FROM PAGEVIEWS_FEMALE WHERE CAST(SPLIT(PAGEID,'_')[2] as INT) >= 50;
    "Exception while preparing statement: PAGEVIEWS_FEMALE does not exist."

Troubleshooting Increased deployment timeout to 2048 and 90 in some ksqlDB cluster deployment sections due to timeouts in previous runs.

Environment

~/0/Confluent/examples/cp-quickstart (7.3.0-post ✘)✹✭ ᐅ CLUSTER_CLOUD=azure CLUSTER_REGION=westus2 ./start-cloud.sh
✔ jq found

====== Confirm

--------------------------------------------------------------------------------------------
This example runs on Confluent Cloud, sign up here:

         https://www.confluent.io/confluent-cloud/tryfree/

The example uses real Confluent Cloud resources that may be billable, including connectors
and ksqlDB applications that may have hourly charges. The end of this script shows a command
you can run to destroy all the cloud resources, and you should verify they are destroyed.

New Confluent Cloud signups receive $400 to spend within Confluent Cloud during their first
60 days. Use Confluent Cloud promo code C50INTEG to receive an additional $50 free usage.
This will sufficiently cover one day of running this example, beyond which you may be billed
for the Confluent Cloud resources until you destroy them.
--------------------------------------------------------------------------------------------

Do you still want to run this script? [y/n] y
Do you acknowledge this script creates a Confluent Cloud KSQL app (hourly charges may apply)? [y/n] y
✔ Confluent CLI version ok
✔ Logged into the Confluent CLI
✔ Prerequisite check pass

For your reference the demo will highlight some commands in code format

====== Starting

====== Creating new Confluent Cloud stack using the ccloud::create_ccloud_stack function
See: https://github.com/confluentinc/examples/blob/7.3.x/utils/ccloud_library.sh for details
Creating Confluent Cloud stack for service account demo-app-4475, ID: sa-kgvznp.

Waiting up to 720 seconds for Confluent Cloud cluster to be ready and for credentials to propagate

Sleeping an additional 80 seconds to ensure propagation of all metadata

    Principal    | Permission | Operation | Resource Type | Resource Name | Pattern Type
-----------------+------------+-----------+---------------+---------------+---------------
  User:sa-kgvznp | ALLOW      | CREATE    | TOPIC         | *             | LITERAL
    Principal    | Permission | Operation | Resource Type | Resource Name | Pattern Type
-----------------+------------+-----------+---------------+---------------+---------------
  User:sa-kgvznp | ALLOW      | DELETE    | TOPIC         | *             | LITERAL
    Principal    | Permission | Operation | Resource Type | Resource Name | Pattern Type
-----------------+------------+-----------+---------------+---------------+---------------
  User:sa-kgvznp | ALLOW      | WRITE     | TOPIC         | *             | LITERAL
    Principal    | Permission | Operation | Resource Type | Resource Name | Pattern Type
-----------------+------------+-----------+---------------+---------------+---------------
  User:sa-kgvznp | ALLOW      | READ      | TOPIC         | *             | LITERAL
    Principal    | Permission | Operation | Resource Type | Resource Name | Pattern Type
-----------------+------------+-----------+---------------+---------------+---------------
  User:sa-kgvznp | ALLOW      | DESCRIBE  | TOPIC         | *             | LITERAL
    Principal    | Permission |    Operation     | Resource Type | Resource Name | Pattern Type
-----------------+------------+------------------+---------------+---------------+---------------
  User:sa-kgvznp | ALLOW      | DESCRIBE_CONFIGS | TOPIC         | *             | LITERAL
    Principal    | Permission | Operation | Resource Type | Resource Name | Pattern Type
-----------------+------------+-----------+---------------+---------------+---------------
  User:sa-kgvznp | ALLOW      | READ      | GROUP         | *             | LITERAL
    Principal    | Permission | Operation | Resource Type | Resource Name | Pattern Type
-----------------+------------+-----------+---------------+---------------+---------------
  User:sa-kgvznp | ALLOW      | WRITE     | GROUP         | *             | LITERAL
    Principal    | Permission | Operation | Resource Type | Resource Name | Pattern Type
-----------------+------------+-----------+---------------+---------------+---------------
  User:sa-kgvznp | ALLOW      | CREATE    | GROUP         | *             | LITERAL
    Principal    | Permission | Operation | Resource Type | Resource Name | Pattern Type
-----------------+------------+-----------+---------------+---------------+---------------
  User:sa-kgvznp | ALLOW      | DESCRIBE  | GROUP         | *             | LITERAL
    Principal    | Permission | Operation |  Resource Type   | Resource Name | Pattern Type
-----------------+------------+-----------+------------------+---------------+---------------
  User:sa-kgvznp | ALLOW      | DESCRIBE  | TRANSACTIONAL_ID | *             | LITERAL
    Principal    | Permission | Operation |  Resource Type   | Resource Name | Pattern Type
-----------------+------------+-----------+------------------+---------------+---------------
  User:sa-kgvznp | ALLOW      | WRITE     | TRANSACTIONAL_ID | *             | LITERAL
    Principal    | Permission |    Operation     | Resource Type | Resource Name | Pattern Type
-----------------+------------+------------------+---------------+---------------+---------------
  User:sa-kgvznp | ALLOW      | IDEMPOTENT_WRITE | CLUSTER       | kafka-cluster | LITERAL
    Principal    | Permission | Operation | Resource Type | Resource Name | Pattern Type
-----------------+------------+-----------+---------------+---------------+---------------
  User:sa-kgvznp | ALLOW      | DESCRIBE  | CLUSTER       | kafka-cluster | LITERAL
IMPORTANT: Confirm that the users or service accounts that will interact with this cluster have the required privileges to access Schema Registry.
Set API Key "KJGNA7SU4NK3ZF2Y" as the active API key for "lkc-2rrvym".

Client configuration file saved to: stack-configs/java-service-account-sa-kgvznp.config
✔ cccloud::create_ccloud_stack true

Generating component configurations from stack-configs/java-service-account-sa-kgvznp.config and saving to the folder delta_configs

✔ ccloud::generate_configs stack-configs/java-service-account-sa-kgvznp.config

Setting local environment based on values in delta_configs/env.delta
✔ source delta_configs/env.delta
Set Kafka cluster "lkc-2rrvym" as the active cluster for environment "env-0xnrq9".

Associated key KJGNA7SU4NK3ZF2Y to Confluent Cloud Kafka cluster lkc-2rrvym at pkc-41973.westus2.azure.confluent.cloud:9092
Validated credentials to Confluent Cloud Schema Registry at https://psrc-dz0xz.westus2.azure.confluent.cloud

⌛ ====== Pre-creating topics
Created topic "pageviews".
✔ confluent kafka topic create pageviews
Created topic "users".
✔ confluent kafka topic create users
✔ Topics created

⌛ ====== Create fully-managed Datagen Source Connectors to produce sample data.

Creating connector from connectors/ccloud-datagen-pageviews.json

2022-12-12T20:26:18.039-0800 [DEBUG] Recursively searching $PATH for plugins. Plugins can be disabled in /Users/swoo/.confluent/config.json.

Created connector "lcc-3rr96m" (datagen_ccloud_pageviews).

Creating connector from connectors/ccloud-datagen-users.json

2022-12-12T20:26:20.653-0800 [DEBUG] Recursively searching $PATH for plugins. Plugins can be disabled in /Users/swoo/.confluent/config.json.

Created connector "lcc-nww2p3" (datagen_ccloud_users).
Waiting up to 300 seconds for connector connectors/ccloud-datagen-pageviews.json (datagen_ccloud_pageviews) to be RUNNING
........
Connector connectors/ccloud-datagen-pageviews.json (datagen_ccloud_pageviews) is RUNNING
Waiting up to 300 seconds for connector connectors/ccloud-datagen-users.json (datagen_ccloud_users) to be RUNNING

Connector connectors/ccloud-datagen-users.json (datagen_ccloud_users) is RUNNING

Sleeping 30 seconds to give the Datagen Source Connectors a chance to start producing messages

====== Setting up ksqlDB

⌛ Waiting up to 2048 seconds for Confluent Cloud ksqlDB cluster to be UP
.................................
✔ Confluent Cloud KSQL is UP
Obtaining the ksqlDB App Id
✔ confluent ksql cluster list -o json | jq -r '.[].id'
    lksqlc-5ww0dq

Configuring ksqlDB ACLs
✔ confluent ksql cluster configure-acls lksqlc-5ww0dq pageviews users

Sleeping 90 seconds

Submitting KSQL queries via curl to the ksqlDB REST endpoint
    See https://docs.ksqldb.io/en/latest/developer-guide/api/ for more information
✘ CREATE STREAM pageviews WITH (kafka_topic='pageviews', value_format='AVRO');
    "Schema for message values on topic 'pageviews' does not exist in the Schema Registry.nSubject: pageviews-valuenPossible causes include:n- The topic itself does not existnt-> Use SHOW TOPICS; to checkn- Messages on the topic are not serialized using a format Schema Registry supportsnt-> Use PRINT 'pageviews' FROM BEGINNING; to verifyn- Messages on the topic have not been serialized using a Confluent Schema Registry supported serializernt-> See https://docs.confluent.io/current/schema-registry/docs/serializer-formatter.htmln- The schema is registered on a different instance of the Schema Registrynt-> Use the REST API to list available subjectsthttps://docs.confluent.io/current/schema-registry/docs/api.html#get--subjectsn- You do not have permissions to access the Schema Registry.nt-> See https://docs.confluent.io/current/schema-registry/docs/security.html"
✘ CREATE TABLE users (id STRING PRIMARY KEY) WITH (kafka_topic='users', value_format='PROTOBUF');
    "Schema for message values on topic 'users' does not exist in the Schema Registry.nSubject: users-valuenPossible causes include:n- The topic itself does not existnt-> Use SHOW TOPICS; to checkn- Messages on the topic are not serialized using a format Schema Registry supportsnt-> Use PRINT 'users' FROM BEGINNING; to verifyn- Messages on the topic have not been serialized using a Confluent Schema Registry supported serializernt-> See https://docs.confluent.io/current/schema-registry/docs/serializer-formatter.htmln- The schema is registered on a different instance of the Schema Registrynt-> Use the REST API to list available subjectsthttps://docs.confluent.io/current/schema-registry/docs/api.html#get--subjectsn- You do not have permissions to access the Schema Registry.nt-> See https://docs.confluent.io/current/schema-registry/docs/security.html"
✘ CREATE STREAM pageviews_female AS SELECT users.id AS userid, pageid, regionid, gender FROM pageviews LEFT JOIN users ON pageviews.userid = users.id WHERE gender = 'FEMALE';
    "Exception while preparing statement: PAGEVIEWS does not exist."
✘ CREATE STREAM pageviews_female_like_89 AS SELECT * FROM pageviews_female WHERE regionid LIKE '%_8' OR regionid LIKE '%_9';
    "Exception while preparing statement: PAGEVIEWS_FEMALE does not exist."
✘ CREATE TABLE pageviews_regions WITH (key_format='JSON') AS SELECT gender, regionid , COUNT(*) AS numusers FROM pageviews_female WINDOW TUMBLING (size 30 second) GROUP BY gender, regionid HAVING COUNT(*) > 1;
    "Exception while preparing statement: PAGEVIEWS_FEMALE does not exist."
✘ CREATE STREAM accomplished_female_readers WITH (value_format='JSON_SR') AS SELECT * FROM PAGEVIEWS_FEMALE WHERE CAST(SPLIT(PAGEID,'_')[2] as INT) >= 50;
    "Exception while preparing statement: PAGEVIEWS_FEMALE does not exist."

Confluent Cloud ksqlDB ready

Local client configuration file written to stack-configs/java-service-account-sa-kgvznp.config

====== Verify

View messages in the topic 'pageviews' (Avro):
    confluent kafka topic consume pageviews --value-format avro --print-key

View messages in the topic 'users' (Protobuf):
    confluent kafka topic consume users --value-format protobuf --print-key

View messages in the topic backing the ksqlDB stream 'accomplished_female_readers' (JSON Schema):
    confluent kafka topic list | grep ACCOMPLISHED_FEMALE_READERS | xargs -I {} confluent kafka topic consume {} --value-format jsonschema --print-key

Confluent Cloud ksqlDB and the fully managed Datagen Source Connectors are running and accruing charges. To destroy this demo and its Confluent Cloud resources->
    ./stop-cloud.sh stack-configs/java-service-account-sa-kgvznp.config

~/0/Confluent/examples/cp-quickstart (7.3.0-post ✘)✹✭ ᐅ
bbejeck commented 1 year ago

@sw00t - I can replicate this error. The schemas are present in SR, so I suspect that the way to give ksqlDB permission to access SR has changed. I'll need to figure out the correct approach for doing this. I was able to confirm this behavior on AWS as well.

BTW: for the variables CLUSTER_CLOUD and CLUSTER_REGION to take effect, I needed to use export first.

bbejeck commented 1 year ago

Confirmed that the issue is due to adding RBAC enabled for SR for ksqlDB access

bbejeck commented 1 year ago

Fixed via https://github.com/confluentinc/examples/pull/1169