DataReply / kafka-connect-mongodb

Apache License 2.0
129 stars 61 forks source link

Is there a way to prevent the JSON structure that is pulled from mongo from being destroyed? #3

Open grindthemall opened 8 years ago

grindthemall commented 8 years ago

Ok, so I love the connector - it makes getting data from Mongo to Kafka easy, however when the data gets pulled from mongo, it loses its own JSON structure. Is there a way to prevent this from happening?

Example: My JSON looks like this in Mongo:

{ 
   "_id": "56feaa424f1249736af0ba4f",
   "ts":"Thu Mar 31 09:00:00 PDT 2016",
   "type":"hrlyEvtLvls",
   "events": [{
                     "TechDump":4,
                     "Reboot":174,
                     "FatalError":0
                  }]
}

However when I retrieve it from Kafka this is what it looks like in the payload object:

Document{{_id=56feaa424f1249736af0ba4f, ts=Thu Mar 31 09:00:00 PDT 2016, type=hrlyEvtLvls, events=Document{{TechDump=4, Reboot=174, FatalError=0}}}}

I don't have an easy way to reprocess the received data back into JSON, so is there some way I can prevent my JSON from being lost in the first place?

patelliandrea commented 8 years ago

Hi, we wrote this connector for working with the confluent schema registry. Since the schema of the documents may change, if we preserve the schema, it may not be compliant with the schema on the schema registry. I think that I can write a version that maintains the schema, but then you can use it only with the Json converter or using the schema registry but with compatibility none (unless the schema on mongo never changes).

grindthemall commented 8 years ago

Thanks for the quick reply.

If it's easy to maintain the schema of the original message, that would be best. Many of us using Mongo prefer things to stay in JSON format so we don't need to do ETL every step of the way.

Everything else I have running through Kafka is 100% JSON so we can consume it quickly.

Having this one (hopefully small) change would be huge for me (and I'd suspect many others).

grindthemall commented 8 years ago

Actually, if you could point me to the code where the conversion of the JSON from mongo is performed, I could probably fix it up.

patelliandrea commented 8 years ago

Hi, the conversion is here.

grindthemall commented 8 years ago

Ok, I see that section where the schema section and payload section are built. I'm not much of a Java person, so I'm having a difficult time seeing where "Document=" gets inserted. I don't see any reference to "Document=" in the code.

toString() as far as I am aware simply ensures that the data being pulled is completely in the form of a string.

patelliandrea commented 8 years ago

Hi, the document is flattened using

messageStruct.put("object", message.get("o").toString());

The problem is that when using a schema you need to know the schema of the object beforehand because is difficult to extract the schema from every mongo documento.

Ronniexie commented 8 years ago

@patelliandrea @grindthemall Do you know the kafka-connect-mongodb use AvroConverter or JsonConverter?

patelliandrea commented 8 years ago

@Ronniexie it's using the converter you set in the connector properties (distributed or standalone).