SAP / kafka-connect-sap

Kafka Connect SAP is a set of connectors, using the Apache Kafka Connect framework for reliably connecting Kafka with SAP systems
Apache License 2.0
125 stars 58 forks source link

Unable to replicate from HANA table to Kafka topic if field name has "/" in it #126

Closed srkpers closed 2 years ago

srkpers commented 2 years ago

@elakito I am unable to replicate from HANA table to Kafka topic using the HANA source connector for a table which has "/" in the field name. I am not using Schema Registry as I understand Avro format does not allow "/" in the field names. This is a standard SAP table so there is not much scope to change the field name as it will impact the application. Any workaround for this issue?

Caused by: org.apache.avro.SchemaParseException: Illegal initial character: /VSO/R_PKGRP

srkpers commented 2 years ago

Noticed the error says avro schema although I did not configure it. Looks like the connect worker is adding it by default. Introduced the below in the connector config and able to replicate the table in json format without schema registry.

schemas.enable=false value.converter.schemas.enable=false key.converter.schemas.enable=false key.converter=org.apache.kafka.connect.json.JsonConverter value.converter=org.apache.kafka.connect.json.JsonConverter

elakito commented 2 years ago

@srkpers I think the field name including the slash character should work when using JsonConverter because there is no field name check in kafka-connect's schema code (i.e., *schemas.enable=true should work if the converter is kafka connect's JsonConverter, which are the default setting at the connect broker layer). So, I don't know why you saw that avro error when you were not using Avro. Maybe your connector broker was configured with the different converter setting?

So, in principle, it is possible to use field names that are invalid for the Avro's name syntax rule, but I don't know if it is okay to use such names in the Json schema. We should probably add a name conversion option so that it will work when using avro schemas.

srkpers commented 2 years ago

@elakito It was producing avro schema as I had configured avro convertor in the connect cluster and it was defaulting to it. I then introduced json convertor at the connector level and after that it worked. It will be nice to rename the field by stripping off the invalid character in case of Avro so we can use Avro schema also as changing field names or table names in packaged softwares will be very difficult as compared to similar change in customer software.

elakito commented 2 years ago

@srkpers Ok. That explains what happened. Regarding the field name handling, yes, I'll add an optional escape conversion so that source->sink over avro preserves those column names.

elakito commented 2 years ago

@srkpers Another option would be to use kafka connect's ReplaceField SMT. This approach should work for you if you know the column names that have the slash character. For example, your source table has a column ns/field2, you can configure this transformer at the source connector.

        "transforms": "renameField",
        "transforms.renameField.type": "org.apache.kafka.connect.transforms.ReplaceField$Value",
        "transforms.renameField.renames": "ns/field2:ns_field2",

which will transform ns/field2 into ns_field2. You can add the inverse transformation at the sink connector. The documentation for this transformer is available at https://docs.confluent.io/platform/current/connect/transforms/replacefield.html

srkpers commented 2 years ago

@elakito Yes, The input on SMT helped. I tested it with a table which had 4 such fields and it was able to rename them. Thank you. I will go ahead and close this issue.