confluentinc / confluent-hybrid-cloud-workshop

Confluent Hybrid Cloud Workshop
Apache License 2.0
10 stars 34 forks source link

Customers, Products, and Suppliers do not flow to Google Big Query #38

Open chadmott opened 4 years ago

chadmott commented 4 years ago

At the end of the lab, I see transactional data in Big Query, but not the customers, products, or suppliers data.

In my local confluent control center, I see this data in their respective topics, and CC is properly showing their values (so it is schema-aware)

In confluent cloud (which is where the connector is configured to pull from) I see the data, but for the values i see the binary representation of the AVRO encoded data. I suspect that for whatever reason, the confluent cloud cluster is unable to deserialize the data?

The connector is running


Name                 : DC01_GCS_SINK
Class                : io.confluent.connect.gcs.GcsSinkConnector
Type                 : sink
State                : RUNNING
WorkerId             : kafka-connect-ccloud:18084

 Task ID | State   | Error Trace
---------------------------------
 0       | RUNNING |
---------------------------------

with no errors.

Could you comment on why dont I see any errors? How can I view messages that the connector "skipped" when running in KSQL mode?

chadmott commented 4 years ago

@tmcgrath I suspect this is why you were getting the error in Data Studio... the queries are joining on IDs that (at least for me) do not exist

chadmott commented 4 years ago

Quick update --- adding the ID fields to the tables in BigQuery, and then re-starting the connector has data flowing in. Seems for whatever reason the ID does not exist in the Schemas...

{
  "connect.name": "io.confluent.ksql.avro_schemas.KsqlDataSourceSchema",
  "fields": [
    {
      "default": null,
      "name": "FIRST_NAME",
      "type": [
        "null",
        "string"
      ]
    },
    {
      "default": null,
      "name": "LAST_NAME",
      "type": [
        "null",
        "string"
      ]
    },
    {
      "default": null,
      "name": "EMAIL",
      "type": [
        "null",
        "string"
      ]
    },
    {
      "default": null,
      "name": "CITY",
      "type": [
        "null",
        "string"
      ]
    },
    {
      "default": null,
      "name": "COUNTRY",
      "type": [
        "null",
        "string"
      ]
    },
    {
      "default": null,
      "name": "SOURCEDC",
      "type": [
        "null",
        "string"
      ]
    }
  ],
  "name": "KsqlDataSourceSchema",
  "namespace": "io.confluent.ksql.avro_schemas",
  "type": "record"
}

if there is no schema here, it makes sense why it wouldn't show up in BigQuery but... why then did the data flow after I manually added the ID to big query?