confluentinc / ksql

The database purpose-built for stream processing applications.
https://ksqldb.io
Other
89 stars 1.04k forks source link

Illegal initial character when CSAS to Avro #2849

Open rmoff opened 5 years ago

rmoff commented 5 years ago

This is a valid stream created in KSQL from existing JSON data - note the column 3ALPHA:

ksql> DESCRIBE CORPUS_RAW;

Name                 : CORPUS_RAW
 Field     | Type
---------------------------------------
 ROWTIME   | BIGINT           (system)
 ROWKEY    | VARCHAR(STRING)  (system)
 NLCDESC   | VARCHAR(STRING)
 NLC       | VARCHAR(STRING)
 TIPLOC    | VARCHAR(STRING)
 3ALPHA    | VARCHAR(STRING)
 STANOX    | VARCHAR(STRING)
 NLCDESC16 | VARCHAR(STRING)
 UIC       | VARCHAR(STRING)
---------------------------------------
For runtime statistics and query details run: DESCRIBE EXTENDED <Stream,Table>;

However, if I try to CSAS from this and use Avro, KSQL complains about the column name. It works fine if I leave the serialisation as JSON

ksql> CREATE STREAM FOO AS SELECT * FROM CORPUS_RAW;

 Message
----------------------------
 Stream created and running
----------------------------

🔴

ksql> CREATE STREAM FOO4 WITH (VALUE_FORMAT='AVRO') AS SELECT * FROM CORPUS_RAW ;
Illegal initial character: 3ALPHA

This seems extremely inconsistent; why can't KSQL handle 3ALPHA if serialising to Avro?

MichaelDrogalis commented 5 years ago

Not implying the current behavior is good, but field names that start with a number aren't legal in Avro. See the spec.