confluentinc / ksql

The database purpose-built for stream processing applications.
https://ksqldb.io
Other
114 stars 1.04k forks source link

KSQL 5.4.0-beta1 - Decimal support is broken within Struct #3874

Open robinroos opened 4 years ago

robinroos commented 4 years ago

I have cloned examples, and updated my local copy of docker-compose.yml from cp-all-in-one to use the 5.4.0-beta1 Confluent Platform images.

I am working with Avro IDL-generated schemas modelling FX Trades. My message Keys and also Values are AVRO "records".

I had issues with DECIMAL support, even outside KSQL, so I added Pi=3.141 to the top-level structure of the Trade message. Pi helped me identify that spring.kafka.properties.specific.avro.reader=true must be set when using SpecificAvro consumers.

KSQL seems to work well with the field Pi - I can see its value, I can SELECT it from the message, etc.

The Trade message also contains an embedded structure, called FXLeg. That structure contains quite a few DECIMALS. Those decimals, the ones inside STRUCT, are not supported by KSQL. Their values are incorrectly rendered (as if the String representation of the byte[] is being displayed, without conversion through the appropriate deserialiser), and I am unable to SELECT any column (regardless of type) from within the STRUCT.

As you will see below the Trade message contains two instances of the STRUCT, called "NearLeg" and "FarLeg", and the messages I am sending have FarLeg=null, just in case that is relevant. Everything below refers to NearLeg which is not null.

1 Version Info from KSQL CLI - confirms I have 5.4.0-beta1 CLI and Server

CLI v5.4.0-beta1, Server v5.4.0-beta1 located at http://0.0.0.0:8088

2 Creating the stream from the topic

ksql> create stream trades with (kafka_topic='trade', value_format = 'AVRO');

3 Describe the stream

ksql> describe trades;
Name                 : TRADES
 Field         | Type                                                                                                                                                   
------------------------
 ROWTIME       | BIGINT           (system)                                                                                                                              
 ROWKEY        | VARCHAR(STRING)  (system)                                                                                                                              
 TRADEREF      | VARCHAR(STRING)                                                                                                                                        
 PI            | DECIMAL                                                                                                                                                
 TRADETYPE     | VARCHAR(STRING)                                                                                                                                        
 BUYSELL       | VARCHAR(STRING)                                                                                                                                        
 COUNTERPARTY  | VARCHAR(STRING)                                                                                                                                        
 PAIR          | VARCHAR(STRING)                                                                                                                                        
 TRADEDATETIME | BIGINT                                                                                                                                                 
 TRADEDATE     | INTEGER                                                                                                                                                
 NEARLEG       | STRUCT<SETTLEDATE INTEGER, BASEAMOUNT DECIMAL, QUOTEDAMOUNT DECIMAL, SPOTPRICE DECIMAL, TRADERPOINTS DECIMAL, SALESPOINTS DECIMAL, ALLINPRICE DECIMAL> 
 FARLEG        | STRUCT<SETTLEDATE INTEGER, BASEAMOUNT DECIMAL, QUOTEDAMOUNT DECIMAL, SPOTPRICE DECIMAL, TRADERPOINTS DECIMAL, SALESPOINTS DECIMAL, ALLINPRICE DECIMAL> 
-------------------------
For runtime statistics and query details run: DESCRIBE EXTENDED <Stream,Table>;

4 Pi is working - it lives in the top-level of the message

ksql> SELECT PI FROM TRADES;
+---------------------------
|PI                                                                                                                                                                                                                                                        |
+----------------------------
|3.141                                                                                                                                                                                                                                                     |
^CQuery terminated

5 Selecting NEARLEG shows corruption of DECIMAL values within that STRUCT

ksql> select NEARLEG from TRADES;
+----------------
|NEARLEG                                                                                                                                                                                                                                                   |
+-----------------
|{SETTLEDATE=18213, BASEAMOUNT=AlQL5AA=, QUOTEDAMOUNT=At/RwEA=, SPOTPRICE=SZT5oA==, TRADERPOINTS=AA==, SALESPOINTS=AA==, ALLINPRICE=SZT5oA==}                                                                                                              |
^CQuery terminated

6 Unable to select a DECIMAL field from the STRUCT

ksql> SELECT NEARLEG -> ALLINPRICE FROM TRADES;
Field 'ALLINPRICE' cannot be resolved.

7 Unable to select an INTEGER field from the STRUCT

ksql> SELECT NEARLEG->SETTLEDATE FROM TRADES;
Field 'SETTLEDATE' cannot be resolved.

8 The AVRO message, being repeated back to me by a Consumer (SpringBoot). Just the TradeRef changes with each subsequent message. Scroll to the far right to see that the Decimals within NearLeg are all good.

2019-11-16 10:17:19.359  INFO 3600 --- [ntainer#1-0-C-1] u.c.s.aggregator.service.TradeConsumer   : ### -> Consumed message -> {"TradeRef": "10007", "pi": 3.141, "TradeType": "FXSPOT", "BuySell": "BUY", "Counterparty": "RBS", "pair": "EURUSD", "TradeDateTime": 2019-11-16T10:17:19.353Z, "TradeDate": 2019-11-11, "NearLeg": {"SettleDate": 2019-11-13, "BaseAmount": 1000000.0000, "QuotedAmount": 1234500.0000, "SpotPrice": 1.234500000, "TraderPoints": 0E-9, "SalesPoints": 0E-9, "AllInPrice": 1.234500000}, "FarLeg": null}
robinroos commented 4 years ago

Attaching JSON Schema from Control Center, for Key and Value of topic Trade.

schema-trade-key-v1.txt schema-trade-value-v1.txt

robinroos commented 4 years ago

Attaching Avro IDL (fx.avdl). I trimmed this down to show just Trade (the Value record), CurrencyPairKey (the Key record), and necessary supporting types (BuySell, TradeType and FXLeg).

fx_avdl.txt

robinroos commented 4 years ago

Docker-compose.yml, originally taken from examples/cp-all-in-one, and updated to use 5.4.0-beta1 Confluent Platform images.

docker-compose_yml.txt

robinroos commented 4 years ago

I appreciate this is the KSQL repo, but I could not find an enterprise-control-center one. Can you tell me where to raise the following related issues?

1 Control Center, when launched through the 5.4.0-beta1 image, still reports its version as 3.5.1.

2 Control Center, which I believe to be 5.4.0-beta1, does not properly render DECIMAL types in the Messages pane for a Topic. This applies equally to decimals in the top-level of a message and decimals within structures.

image

image

image

robinroos commented 4 years ago

The Docker Images I am using, 5.4.0-beta1 from Docker Hub, are 2 months old. If Confluent Inc could coordinate the publishing of -beta2 images to Docker Hub then I could easily re-test the above, both KSQL and Control Center.

robinroos commented 4 years ago

I'd like to see the Control Center issue spun off to the appropriate forum in time for this to be addressed before final release of 5.4.0. Can we make that happen?

apurvam commented 4 years ago

cc @agavra this seems to be a bug with decimals. Can you triage?

agavra commented 4 years ago

Hello @robinroos - I tried to reproduce this locally and was only able to reproduce the problem partially (note the syntax that I'm using for struct creation is not yet committed):

ksql> CREATE STREAM decimals (col0 STRUCT<val DECIMAL(2,1)>) WITH (kafka_topic='decimals', partitions=1, value_format='AVRO');

 Message
----------------
 Stream created
----------------
ksql> INSERT INTO decimals (col0) VALUES ({VAL '2.1'});
ksql> SELECT COL0->val FROM DECIMALS EMIT CHANGES;
+----------------------------------------------------------------------------------------------+
|COL0__VAL                                                                                     |
+----------------------------------------------------------------------------------------------+
|2.1                                                                                           |
^CQuery terminated
ksql> SELECT COL0 FROM DECIMALS EMIT CHANGES;
+----------------------------------------------------------------------------------------------+
|COL0                                                                                          |
+----------------------------------------------------------------------------------------------+
|{VAL=FQ==}                                                                                    |

The first thing to note, is that there isn't corruption of the decimal data - that's just the way our CLI displays it (as byte strings) because that's the way they are stored in AVRO (a logical byte type). I agree that this isn't ideal (it's impossible to read!) but you can see in my example above that when it's a top-level field it displays fine, but when it's in a struct the CLI displays it as bytes.

Secondly, I'm not sure why you can't access the struct using ->. Since you can't select the int field in the NEARLEG struct either, I don't think this has anything to do with decimals. I'm still trying to figure out why this is coming up.

agavra commented 4 years ago

I've opened https://github.com/confluentinc/ksql/issues/4118 to track the display issue.

robinroos commented 4 years ago

Thanks for #4118.

I suspect that your local changes (in support of DECIMAL within STRUCT) have inadvertently fixed the inability to access even non-decimal fields within a struct which does contain at least one decimal type.

If you wish I can specifically re-test (using -beta1) the selection of an int field from within a struct which DOES NOT contain any decimal types, if that might help. My preference, however, would be to rerun the full test on -beta2 images from Docker Hub as soon as your initial work is stable.

Any news regarding expected -beta2 availability? https://github.com/confluentinc/cp-docker-images/issues/812

agavra commented 4 years ago

I think it's unlikely that my local changes affected struct access patterns (the code I wrote was to add syntax to create structs inline and is pretty isolated). Can you select from a struct that doesn't have a decimal in it?

w.r.t to -beta2 I don't have any timelines for the CP release, but we release ksqlDB docker images you can test against on a somewhat more regular basis. The most current image is ksqldb-server:0.6.0

boxsterman commented 4 years ago

I am facing a very similar - if not the same - issue. Running ksql 5.4.0 and using a toplevel record containing a DECIMAL value PI and a nested struct which also contains a DECIMAL value:

@namespace("ch.seibertec.example.decimaltypetest")
protocol DecimalTypeTest {
  record RecordWithOnlyDecimal {
    decimal(21,10) value;
  }
  record Toplevel {
    RecordWithOnlyDecimal value;
    decimal(21,10) pi;
  }
}

I can access and query both values, but only the top level one is displayed as decimal, the nested one is displayed as bytes. Here the nested decimal field has the value 1.1:

ksql> select * from TMP_test_dec_3 where VALUE->VALUE > 1 emit changes;
[ 1580106243484, "1", {
  "VALUE" : "Ao+mrgA="
}, 3.1400000000 ]

Is there an idea on when this issue could be addressed? Is there anything I could help with?

agavra commented 4 years ago

@boxsterman - the issue you are running into is fixed in master (see this issue: #4118)

big-andy-coates commented 4 years ago

@agavra / @robinroos is this issue now resolved?