confluentinc / kafka-connect-jdbc

Kafka Connect connector for JDBC-compatible databases
Other
19 stars 956 forks source link

Issue with Oracle column type NUMBER set as "Incrementing Column" in Kafka Connect #31

Closed ramkumarkb closed 8 years ago

ramkumarkb commented 8 years ago

Kafka-connect-jdbc program has an issue with Oracle Table that has a Primary Key of Type NUMBER and if we use that for Column as Incrementing (in the mode settings of the Kafka-Connect program).

It appears (from our debugging efforts) we see that NUMBER => java.math.BigDecimal => NUMERIC => BYTES in the kafka-connect-jdbc code. Essentially, the SchemaBuilder.java sets it to Type.BYTES.

And as result, we see the error "Invalid type of Incrementing column: BYTES" when we connect to the Oracle Database. This is a rather severe limitation for the Kafka Connect to be able to work with Oracle Database as most of the PK columns are of type NUMBER

Steps to Reproduce this Issue:

create table TEST_KAFKA_CONNECT {
KAKFA_PK NUMBER not null, UPDATED_AT timestamp not null } 
mode=timestamp+incrementing
incrementing.column.name=KAKFA_PK
timestamp.column.name=UPDATED_AT 

In the Kafka-Connect-JDBC use the Oracle's JDBC driver (tested with Oracle 11.2.0.2.0; driver corresponding Oracle JDBC driver available here: http://www.oracle.com/technetwork/apps-tech/jdbc-112010-090769.html

ewencp commented 8 years ago

@ramkumarkb This is a great point. Currently the code is assuming we can convert the incrementing column's value into a Long, which basically means we're only handling basic integer types. In the case of NUMBER if you don't specify precision/scale, this should be fine since that makes it integral, but in general the data conversion that we're using can't guarantee that.

I think this case can be handled, but we might need to make some assumption to convert the data to a value we can use, e.g. that the scale can be ignored when converting the numeric value and that it will fit within a reasonable range (i.e. Longs) so we can just use the unscaled NUMBER directly as the incrementing column's value. Would this work for your use case?

graywill commented 8 years ago

@ewencp I'm having the same issue with NUMBER types and I think this assumption would be fine in my environment.

gwenshap commented 8 years ago

So... probably heresy but: In Sqoop we bypass all this by storing the IDs in Incrementing queries as a string. Since all we do is use this value to construct a WHERE statement, this doesn't have impact on data integrity and does preserve accuracy.

It won't work for the actual data we write to Kafka (where we want to preserve the schema), but it can work for offsets in incremental queries. What do you think?

(https://github.com/apache/sqoop/blob/trunk/src/java/org/apache/sqoop/tool/ImportTool.java#L313)

tdpauw commented 8 years ago

I have a unit test that reproduces the problem. Will try to submit a PR.

ggrossetie commented 8 years ago

I think the issue is broader. I'm using the following query with Confluent 3.0.0:

SELECT COUNT(*) FROM ....

With @tdpauw fix, the result is:

{"COUNT(*)":8}

Without, the result is:

{"COUNT(*)":"CA=="}
plaksina commented 8 years ago

This is a big problem for us , we have the same issue with decimal(23,0) in db2. Is there a any estimation on when it will be fixed ?

arunshanmugham commented 8 years ago

I have the same problem. I am using Confluent 3.0.0. So how do we solve this problem of using NUMBER PRIMARY column as incremental column for Kafka Connect when using Oracle table.?

plaksina commented 8 years ago

This cannot be done by design, the value is cased to java.long so number column, cannot be used as incremental column

On Thu, Aug 25, 2016, 16:50 arunshanmugham notifications@github.com wrote:

I have the same problem. I am using Confluent 3.0.0. So how do we solve this problem of using NUMBER PRIMARY column as incremental column for Kafka Connect when using Oracle table.?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/confluentinc/kafka-connect-jdbc/issues/31#issuecomment-242393372, or mute the thread https://github.com/notifications/unsubscribe-auth/ADRRdbZwRkwIUSj0qHlHWcxtwxhmQMj1ks5qjZ22gaJpZM4G97RJ .

arunshanmugham commented 8 years ago

So nobody is using Kafka-connect-jdbc with Oracle having primary key as Number? Most of the Oracle tables we use primary key as Number. One of ewencp above states that if the number is defined without a scale or decimal it should be fine. I also tried with a number (38) which is what oracle converts when we define column as Integer. Even that is giving the same issue.

wegged commented 8 years ago

I am having the same issue with the decimal data type in SQL Server on a vendor database (so unfortunately, I cannot change the data type) currently the PK is set to decimal(10,0).

stewartbryson commented 8 years ago

Defining the Oracle data type as NUMBER without precision also produces this error.

arunshanmugham commented 8 years ago

I have managed to modify the code to handle the number column in oracle.

Arun

On Sep 7, 2016 3:12 AM, "Stewart Bryson" notifications@github.com wrote:

Defining the Oracle data type as NUMBER without precision also produces this error.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/confluentinc/kafka-connect-jdbc/issues/31#issuecomment-245105319, or mute the thread https://github.com/notifications/unsubscribe-auth/ADOW_GIbrZUCl6Nbt94giFHuG6au1mmtks5qnd5RgaJpZM4G97RJ .

stewartbryson commented 8 years ago

@arunshanmugham Are you opening a pull request for this?

arunshanmugham commented 8 years ago

No. I just downloaded as a zip and changed it locally.

Arun

On Sep 7, 2016 10:43 PM, "Stewart Bryson" notifications@github.com wrote:

@/arunshanmugham Are you opening a pull request for this?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/confluentinc/kafka-connect-jdbc/issues/31#issuecomment-245351709, or mute the thread https://github.com/notifications/unsubscribe-auth/ADOW_JV-O0rCGG42vz6azwh_xnYd6U3dks5qnvCegaJpZM4G97RJ .

ghost commented 8 years ago

Was anyone else successful devising a workaround solution at least for all those cases where the option of creating a new/modified table exists ? Mode timestamp IS working for the kafka jdbc connector.

Here is a workaround I am "testing" for an Oracle db that uses a modified timestamp to include the increment as fractions of seconds. Caveat is that it is limited to 9 digits which is however sufficient for my use cases. Example timestamp with id added:

SELECT TO_TIMESTAMP (TO_CHAR(SYSDATE,'DD-Mon-RR HH24:MI:SS.') || V_CHG_Id, 'DD-Mon-RR HH24:MI:SS.FF9') INTO V_CHG_TIMESTAMP FROM DUAL;

A new delta table of table changes using this modified timestamp as primary key could be created at least for an enterprise Oracle db as follows: a) create a materialized view and view log on the source table b) create a row level trigger on materialized view table that inserts new delta table (can even capture DML type) c) configure kafka jdbc source connector on delta table d) refresh on demand from kafka connector or perform scheduled db mv refresh - kafka jdbc source connector will poll (NLS_TIMESTAMP_FORMAT parm may need to be set for db session pending default)

Result: out of the box kafka connect jdbc connector with properties query defined and mode timestamp is working for Oracle db.

rrangine commented 7 years ago

@arunshanmugham Could you please share which part of Type conversion code have you made to have the NUMBER type work . Looks like the issue is still persists as other non-incrementing columns in a tables(ORACLE) that are type NUMBER are not deserialized correctly.

arunshanmugham commented 7 years ago

@rrangine . In DataConverter, I have moved case Types.NUMERIC before case Types.INTEGER along with the same case clause as below case Types.NUMERIC: case Types.INTEGER: { if (optional) { builder.field(fieldName, Schema.OPTIONAL_INT32_SCHEMA); } else { builder.field(fieldName, Schema.INT32_SCHEMA); } break; }

Previously, the case statement was given along with DECIMAL.

Please note that it worked for my requirement. I have not tested all the scenarios, hence have not committed it back.

boristyukin commented 6 years ago

still an issue with Oracle type NUMBER (no precision)

boristyukin commented 6 years ago

I tried SMT cast transform with no luck as well

praveengone commented 6 years ago

I am also facing the similar problem with Oracle db, JDBCSourceConnector. Is there any work a round to this problem

familywald commented 6 years ago

select to_char( myNumberField ) as myNumberField from myTable

Treat the incoming data as a string then create a transformation stream processor to map the data into a object structure you want.

Cheers.

cheungtsl commented 6 years ago

We are using a system generated GUID as the primary key in the Oracle table. Integer/Number/Long datatype will not work for us.

dilipsundarraj1 commented 3 years ago

select to_char( myNumberField ) as myNumberField from myTable

Treat the incoming data as a string then create a transformation stream processor to map the data into a object structure you want.

Cheers.

This approach worked for me!