confluentinc / ksql

The database purpose-built for stream processing applications.
https://ksqldb.io
Other
102 stars 1.04k forks source link

JSON Stringify UDF #5616

Open MichaelDrogalis opened 4 years ago

MichaelDrogalis commented 4 years ago

Sometimes you want to work with a JSON string rather than the record itself or a struct in the record. This is useful if you are invoking a UDF that expects JSON or uses an underlying library that expects JSON. You could imagine doing something like this

SELECT some_udf(stringify(x, 'json')) FROM s emit changes;  

This is analogous to the xdmp:quote function in MarkLogic XQuery API

Similarly, you can see this with stringify in JavaScript.

big-andy-coates commented 4 years ago

Where x in this case would be what? Some SQL data type? That should be doable.

What other parameters are you expecting for the second param? i.e. beyond json?

Any supported format would need a string representation. So protobuf and avro are out. Delimited doesn't support structured types, so that doesn't make much sense. I guess we could support yaml or other such formats, but given we don't support those as input formats to ksqlDB maybe we only need a toJson or AS_JSON udf - what do you reckon?

MichaelDrogalis commented 4 years ago

Yeah, x would be a column in the schema. Not sure what else we'd support — my example wasn't intended to be the exact way we'd approach it.

cprasad1 commented 4 years ago

Can we provide Struct as an argument to a udf without specifying the schema? If not, then can I easily stringify the components of a Struct object since the only way I can think of doing this is by attaching a toJson() method to the Struct class but we inherit that from connect. Am I getting something wrong here?

MichaelDrogalis commented 4 years ago

I don't believe that you can skip the schema of a struct when you pass it to a UDF, but someone should check me.

vpapavas commented 4 years ago

No you cannot skip that. You must provide the schema of a Struct at compile time of the UDF, basically when you define it. Which means that this UDF cannot be applied to Structs. Is that ok @MichaelDrogalis ?

MichaelDrogalis commented 4 years ago

That seems rather self-defeating since most JSON are maps.

big-andy-coates commented 4 years ago

Surprised to hear we can't handle UDFs with STRUCT parameters, so had a quick play. Looks like they're not supported directly - which is another thing in our UDF framework that needs fixing (https://github.com/confluentinc/ksql/issues/6127) - however, there is a work around that would likely work for this UDF... you can use a template parameter to represent 'any' type, including structs.

  @Udf
  public <T> String iAcceptAnySqlType(@UdfParameter final T val) {
    return (val == null) ? "null" : "non-null";
  }
wlaforest commented 3 years ago

So in theory this toJSON or stringify function would enable you to select a specific field or just take the entire record record and send it as JSON. Or perhaps another alternative would be to pass a hashmap to the function. I'm working on another UDF for applying sigma rules and really what I want to do is send the whole record in. I'm guess this is still in the figuring out stage but was curious if we had done anything with this.

jnh5y commented 2 years ago

@Gerrrr does your recent work with JSON UDFs address this issue?

Gerrrr commented 2 years ago

I think so! to_json_string converts any ksqlDB data type to a JSON string.

wlaforest commented 2 years ago

Can it take the entire record as well?

jnh5y commented 2 years ago

@wlaforest Ah! I don't think ksqlDB has syntax to pass an entire record to a function... We can keep this issue for that then.

wlaforest commented 2 years ago

Yeah so to_json_string will solve a good chunk of the problem which is great news. But for situations where the schema is complicated or doesn't map well to SQL structures we are still stuck. Think about FHIR, or GeoJSON, or NIEM. I would like a function to be able to get the underlying bytes.