data-integrations / multi-table-plugins

Pipeline plugins that allow reading from multiple DB tables with the same source
0 stars 10 forks source link

[PLUGIN-1108] getSchema return Decimal or Long for BIGINT columns #50

Closed BaptBeuns closed 2 years ago

BaptBeuns commented 2 years ago

Context in: https://cdap.atlassian.net/browse/PLUGIN-1108

The goal is to map BIGINT columns to a Decimal if the column is unsigned, or Long if it's signed. Unit tests have been updated to verify backward compatibility. Unfortunately, Hypersql does not have the concept of signed/unsigned bigint, making it cumbersome to test for the Decimal case.

To know whether a sql type is unsigned, we're using a solution that's somewhat hacky because jdbc does not expose this information.

albertshau commented 2 years ago

hm actually this may not be the right thing in all cases. I believe a signed bigint should still be a long, but an unsigned bigint a decimal. Though I'm not sure if the jdbc interfaces expose that information?

BaptBeuns commented 2 years ago

Great point. I couldn't find anything about the JDBC interface exposing that information. But I'm thinking it would be safe to always convert to a decimal. Could you confirm that signed values are supported within the decimal type?

If you think signed bigints should still be mapped to a long and the jdbc interface does not expose the information, do you have a workaround in mind? I personally couldn't come up with anything.

Let me circle back on the tests in a few minutes. edit: let me actually circle back tomorrow.

albertshau commented 2 years ago

One of the problems with changing it to decimal always is that it would be a backwards incompatible change. Upgrading an existing pipeline with a signed bigint would cause it to break.

Does the getColumnTypeName() (https://docs.oracle.com/javase/8/docs/api/java/sql/ResultSetMetaData.html#getColumnTypeName-int-) method return that it's an unsigned bigint? It's somewhat hacky, but thinking we can check if the type name has 'unsigned' in it and the type is BIGINT, then make it decimal, otherwise make it a long.

BaptBeuns commented 2 years ago

Good idea, I think that works: image

After testing by deploying to our CDAP instance, it indeed gets us rid of the "not in union" error.

The tests don't need to be updated for the build to pass. I wanted to include a unit test to ensure that this does not break some day, but I need to find a good place for it. The method is static and I'm always hesitant to add unit tests for a static method. What can be done is updating the PipelineTest class (here) to include a BIGINT in the test DB, but it fails with java.sql.SQLSyntaxErrorException: unexpected token: UNSIGNED when I do it. I'll try to find a good way to do it unless you already have some idea!

BaptBeuns commented 2 years ago

This is ready for another look, please let me know what you think :)

BaptBeuns commented 2 years ago

Done @albertshau :) Thanks!