RedHatInsights / expandjsonsmt

Kafka Connect SMT to expand JSON field
Apache License 2.0
17 stars 18 forks source link

Support for mixed types in arrays of objects #9

Closed jpavlick closed 3 years ago

jpavlick commented 3 years ago

Hello! I'm trying to use this SMT to transform some data I have in a Mongo Database into a Connect record. It works very well overall, but I have one field that looks like this:

[ 
  { "tagName" : "SourceFile", "tagValue" : "string", "tagGroup" : "none" },
  { "tagName" : "SourceFile", "tagValue" : 199, "tagGroup" : "none" },
  { "tagName" : "SourceFile", "tagValue" : [ "tag1", "tag2 ], "tagGroup" : "none" }
]

This fails with the following error:

org.apache.kafka.connect.errors.DataException: Invalid Java object for schema type STRING: class java.lang.Double for field: "tagValue"
        at org.apache.kafka.connect.data.ConnectSchema.validateValue(ConnectSchema.java:245)
        at org.apache.kafka.connect.data.Struct.put(Struct.java:216)
        at org.apache.kafka.connect.data.Struct.put(Struct.java:203)
        at com.redhat.insights.expandjsonsmt.DataConverter.convertFieldValue(DataConverter.java:35)
       ...

It seems to set the schema type for tagValue based on whatever it sees first. For example, if the array was the first thing it saw, I would get an error like Invalid Java obect for schema type ARRAY: class java.lang.Double....

Is this expected for how this SMT is currently designed? If not, is it something that it's possible to add support for? Thanks!

Josca commented 3 years ago

Hello @jpavlick, nice to see our tool is useful for you. It's some time ago I've implemented that. If I remember correctly, yes, schema is created dynamically from the first item, what is quite convenient in most cases, I think. I don't think there is any simple workaround how to support such "multi-type" field.

But to be honest, I don't think it's a good idea to have such data in database. I would rather define "tagValue" as a array in general so you can store: ["string"], ["199"] and ["tag1", "tag2"]. Then you will avoid this issue and probably some others, I guess. Likely it's not the only place where you are going to process this data and it's always much easier when you can expect the same type for each record. Regrds.

jpavlick commented 3 years ago

Thanks for the reply, and sorry it took me awhile to get back to you. We don't actually have access to the source data to be able to change the data at the source, and since it's valid json it seems like perhaps this SMT should be able to handle that use case. That said, I recognize why it's difficult to do in Java, so I decided to create my own SMT to handle this specific case and coerce the singletons into arrays of strings, as per your suggestion. That fixed our issue, so I'll go ahead and close this issue. Thanks for the response!