IBMStreams / streamsx.json

Toolkit for working with JSON in SPL applications.
http://ibmstreams.github.io/streamsx.json/
Other
3 stars 19 forks source link

Add support for mapping a json path to primitive attributes #80

Closed erelsibm closed 6 years ago

erelsibm commented 7 years ago

for example consider the following json: {"name" : "John", "age" : 20 , "address" : { "state" : "NY", "country" : "USA" } }";

that should be parsed into the following tuple: stream<rstring name, int32 age, rstring state, rstring country> ParsedS = JSONToTuple(JsonS, JsonPaths) {}

leongor commented 7 years ago

Actually I developed 2 native functions for JSON parsing that I would like to submit for PR: extractFromJSON and queryJSON.

  1. extractFromJSON tries to find the best fit for all tuple attributes (thus easily can flatten the hierarchical JSON to one level tuple). It uses SAX parser approach.
  2. queryJSON gets JSON pointer path and returns a single value. It uses DOM parser approach.
ilanprager commented 7 years ago

We need to be able to submit multiple JSON Paths, rather than build up the target tuple one field at a time.

leongor commented 7 years ago

Why not call queryJSON multiple times like this:

... output OutStream: x = queryJSON(jsonStr, "/root/a/x"), y = queryJSON(jsonStr, "/root/b/y") ...

schubon commented 7 years ago

Consider the following json: {"name" : "John", "age" : 25 , "partner" : { "name" : "Mary", "age" : 21 } }, "children" : [ { "name" : "Kathy", "age" : 5 }, { "name" : "Peter", "age" : 3 }] };

What should the flat tuplelook like?

erelsibm commented 7 years ago

@schulz2 we could think about a way to map arrays to a flattened schema, but for now we don't support arrays

ilanprager commented 7 years ago

@leongor calling this multiple times will result in reparsing the json string which is inefficient - imagine that there's a message with a few hundred fields and we want to extract 20.

leongor commented 7 years ago

@ilanprager in the initial release yes, but I'm going to apply additional 'parseJSON' function that should be called once before 'queryJSON'.

leongor commented 7 years ago

I've added 'parseJSON' function, so now the JSON is parsed only once. Look at the sample here.

leongor commented 7 years ago

Merged - can be closed.