bryanyang0528 / ksql-python

A python wrapper for the KSQL REST API.
MIT License
159 stars 64 forks source link

Optional parsing of query result into objects #78

Closed harlev closed 4 years ago

harlev commented 4 years ago

The current behavior of a query is to return a generator, which yields first the header with the schema, then each row of the results, all as strings without the field names. This optional addition, keeps this behavior, but adds an optional flag to get each row of the result as a dictionary structured based on the schema.
The user of the Query API can optionally provide return_objects=True as parameter to enable this new functionality.

This change should not break any existing behavior.

A good example for the new functionality is in the test test_ksql_parse_query_result
The schema here includes basic fields (INT, STRING etc) and the structured options of STRUCT, MAP & ARRAY.

In this test the events sent to the topic contain

{"order_id":3,"my_struct":{"a":1,"b":"bbb"}, "my_map":{"x":3, "y":4}, "my_array":[1,2,3], "total_amount":43,"customer_name":"Palo Alto"}`

The default results would be (without the new return_objects flag) a header string

[{"header":{"queryId":"none","schema":"`ORDER_ID` INTEGER, `MY_STRUCT` STRUCT<`A` INTEGER, `B` STRING>, `MY_MAP` MAP<STRING, INTEGER>, `MY_ARRAY` ARRAY<INTEGER>, `TOTAL_AMOUNT` DOUBLE, `CUSTOMER_NAME` STRING"}},

And then the default row of result string would be

{"row":{"columns":[3,{"A":1,"B":"bbb"},{"x":3,"y":4},[1,2,3],43.0,"Palo Alto"]}},

With this new update the result will be a more usefull dictionary with the actual field names

{'ORDER_ID': 3, 'MY_STRUCT': {'A': 1, 'B': 'bbb'}, 'MY_MAP': {'x': 3, 'y': 4}, 'MY_ARRAY': [1, 2, 3], 'TOTAL_AMOUNT': 43.0, 'CUSTOMER_NAME': 'Palo Alto'}

A second test test_ksql_parse_query_result_with_utils shows how the same functionality could be achieved by using the new utility functions directly on the query results to perform the parsing.

codecov[bot] commented 4 years ago

Codecov Report

Merging #78 into master will increase coverage by 0.59%. The diff coverage is 80.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #78      +/-   ##
==========================================
+ Coverage   74.65%   75.25%   +0.59%     
==========================================
  Files           7        7              
  Lines         363      396      +33     
  Branches       48       54       +6     
==========================================
+ Hits          271      298      +27     
- Misses         77       80       +3     
- Partials       15       18       +3     
Impacted Files Coverage Δ
ksql/client.py 80.00% <50.00%> (-1.82%) :arrow_down:
ksql/utils.py 84.61% <83.87%> (-0.50%) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 8a729c1...0c80ddb. Read the comment docs.

bryanyang0528 commented 4 years ago

@harlev Thanks a lot. It's a very useful feature. Because I do not know a good way to test streaming responses in the unit test, since you keep the original behavior, I think we could pass this part of unit-tests first. Is this a good idea?

error message from TravisCI: https://travis-ci.org/github/bryanyang0528/ksql-python/jobs/720870491

harlev commented 4 years ago

@bryanyang0528 I have marked the failing test to be skipped. Same as the other tests of the same type, which are using streaming queries.

bryanyang0528 commented 4 years ago

@harlev Thanks for the great contribution.