Open MichaelDrogalis opened 4 years ago
Where x
in this case would be what? Some SQL data type? That should be doable.
What other parameters are you expecting for the second param? i.e. beyond json
?
Any supported format would need a string representation. So protobuf and avro are out. Delimited doesn't support structured types, so that doesn't make much sense. I guess we could support yaml or other such formats, but given we don't support those as input formats to ksqlDB maybe we only need a toJson
or AS_JSON
udf - what do you reckon?
Yeah, x
would be a column in the schema. Not sure what else we'd support — my example wasn't intended to be the exact way we'd approach it.
Can we provide Struct
as an argument to a udf without specifying the schema? If not, then can I easily stringify the components of a Struct
object since the only way I can think of doing this is by attaching a toJson()
method to the Struct
class but we inherit that from connect. Am I getting something wrong here?
I don't believe that you can skip the schema of a struct when you pass it to a UDF, but someone should check me.
No you cannot skip that. You must provide the schema of a Struct
at compile time of the UDF, basically when you define it. Which means that this UDF cannot be applied to Structs
. Is that ok @MichaelDrogalis ?
That seems rather self-defeating since most JSON are maps.
Surprised to hear we can't handle UDFs with STRUCT parameters, so had a quick play. Looks like they're not supported directly - which is another thing in our UDF framework that needs fixing (https://github.com/confluentinc/ksql/issues/6127) - however, there is a work around that would likely work for this UDF... you can use a template parameter to represent 'any' type, including structs.
@Udf
public <T> String iAcceptAnySqlType(@UdfParameter final T val) {
return (val == null) ? "null" : "non-null";
}
So in theory this toJSON or stringify function would enable you to select a specific field or just take the entire record record and send it as JSON. Or perhaps another alternative would be to pass a hashmap to the function. I'm working on another UDF for applying sigma rules and really what I want to do is send the whole record in. I'm guess this is still in the figuring out stage but was curious if we had done anything with this.
@Gerrrr does your recent work with JSON UDFs address this issue?
I think so! to_json_string converts any ksqlDB data type to a JSON string.
Can it take the entire record as well?
@wlaforest Ah! I don't think ksqlDB has syntax to pass an entire record to a function... We can keep this issue for that then.
Yeah so to_json_string will solve a good chunk of the problem which is great news. But for situations where the schema is complicated or doesn't map well to SQL structures we are still stuck. Think about FHIR, or GeoJSON, or NIEM. I would like a function to be able to get the underlying bytes.
Sometimes you want to work with a JSON string rather than the record itself or a struct in the record. This is useful if you are invoking a UDF that expects JSON or uses an underlying library that expects JSON. You could imagine doing something like this
This is analogous to the
xdmp:quote
function in MarkLogic XQuery APISimilarly, you can see this with stringify in JavaScript.