ClickHouse / clickhouse-java

ClickHouse Java Clients & JDBC Driver
https://clickhouse.com
Apache License 2.0
1.45k stars 534 forks source link

Problems with huge Int data types (Tableau) #771

Closed artshevchenko closed 2 years ago

artshevchenko commented 2 years ago

Hi! Our team is developing a Tableau connector to ClickHouse based on this JDBC driver. During testing, we encountered problems when working with data types UInt64, UInt256, Int128, Int256:

Is this a bug or a feature?

den-crane commented 2 years ago

It's the Tableau problem. It does not support numbers bigger than Int64 such as UInt64/Int128/UInt128/Int256/UInt256.

To overcome this issue, we should add to JDBC a new setting -- expose_long_numbers_as = float64/string.

In case of expose_long_numbers_as = float64 select toFloat64(18446744073709551615) -> 18446744073709552000

In case of expose_long_numbers_as = string 18446744073709551615 -> '18446744073709551615'

zhicwu commented 2 years ago

Thanks @artshevchenko for reporting the issue. Is the project analytikaplus/clickhouse-tableau-connector-jdbc? I'd strongly suggest you guys start to integrate with new JDBC driver for a couple of reasons:

  1. it's been completely rewritten with improved jdbc compliance, especially metadata part
  2. it's smaller(700KB vs 3MB) and faster(see benchmarks at #768) with less memory consumption

Back to the issue, JDBC supports custom type mapping(between database native type and jdbc standard type), but I'm not sure if tableau supports that. If not, I can enhance the driver to expose additional parameters for type mapping.

We're closing to release the new driver. Let me know your findings with the test build, and I'm happy to help.

yurifal commented 2 years ago

Hi @zhicwu , This is Yuri (the other guy from the A+ team).

Testing the 0.3.2 pre-release drivers with Tableau would be rather problematic, please take a look:

Screenshot 2021-12-07 at 01 00 38

Tried both _shaded and _all driver pack -- no luck.

zhicwu commented 2 years ago

Thanks @yurifal and sorry it didn't work for you.

It seems a ServiceLoader issue related to JPMS, which happens on a few clients(e.g. SQuirrel SQL). The workaround is to put the driver in class path explicitly. Let me download a Tableau Destop and see if I can fix the issue and provide a new build.

Update: it's not related to JPMS but the class loader passed to ServiceLoader. I'm now able to connect to ClickHouse on both SQuirrel SQL and Tableau Desktop. Will publish a new build tonight.

zhicwu commented 2 years ago

Tried both _shaded and _all driver pack -- no luck.

Please use _http or _all of v0.3.2-test2 to test. If you keep seeing error like HTTP/1.1 header parser received no bytes(similar as this one?), you may run zip -d <path-to-jdbc.jar> "META-INF/version*" to use HttpURLConnection which is slower but more stable.

yurifal commented 2 years ago

Better this time, can connect and read a schema.

The attempt to read from any table gives the error.

Screenshot 2021-12-09 at 02 08 56

Please find the attached jprotocolserver.log jprotocolserver.log

zhicwu commented 2 years ago

The attempt to read from any table gives the error.

Sorry, I'd not call that better :p Could you download the driver and try again? I re-published the test build using same version.

yurifal commented 2 years ago

@zhicwu No prob, I've connected & made it working using a "shaved" _http driver (HttpURLConnection).

Minor glitches do far (RIP http session_id ;-) btw, do you know if it's possible to establish a valid client session_id via Properties?

I'd like to SET join_use_nulls=1 (in the Initial SQL or wherever, but not make it server-wide) for a full support of Tableau Sets functionality.

zhicwu commented 2 years ago

@zhicwu No prob, I've connected & made it working using a "shaved" _http driver (HttpURLConnection).

Good to know. Sounds like the default implementation using HttpClient didn't work very well :<

Minor glitches do far (RIP http session_id ;-) btw, do you know if it's possible to establish a valid client session_id via Properties?

session_id will be automatically set when executing multi-statement query, but currently it cannot be set via connection URL or properties. Why do you need manually set session_id? Is that just for execute SET statement and/or creating temporary tables?

I'd like to SET join_use_nulls=1 (in the Initial SQL or wherever, but not make it server-wide) for a full support of Tableau Sets functionality.

All settings(except session_id and query_id I think) can be customized in below two ways:

yurifal commented 2 years ago

OK, I could live with the join_use_nulls=1 written to the designated Profile in the users.xml )

Back to the original issue (on huge int / uint), could you please try to add the following Aliases to the respective Data Types?

Int8 -- I1 Int16 -- I2 Int32 -- I4 Int64 -- I8

UInt8 -- UI1 UInt16 -- UI2 UInt32 -- UI4 UInt64 -- UI8

Thanks in advance.

zhicwu commented 2 years ago

Are all remote types defined at here should be added?

I'm not going to mess ClickHouseDataType by adding these aliases directly, because they do not exist in ClickHouse. However, I can add a system property(either -Dname=value in CLI, or a property file located at home directory, for example: ~/.clickhouse/data-type-aliases.properties) for customization, and it's not just for Tableau but also M language types etc. Make sense?

Moving forward, there'll be a clickhouse-jdbc-ext package, which can be used to simplify this(e.g. making join_use_nulls=1 as default, and all kinds of data type mappings for convenience).

yurifal commented 2 years ago
Are all remote types defined at here should be added?

I hope so. Frankly, I have no idea what is the most convenient place for them, up to you.

nvm1 commented 2 years ago

It's the Tableau problem. It does not support numbers bigger than Int64 such as UInt64/Int128/UInt128/Int256/UInt256.

To overcome this issue, we should add to JDBC a new setting -- expose_long_numbers_as = float64/string.

In case of expose_long_numbers_as = float64 select toFloat64(18446744073709551615) -> 18446744073709552000

In case of expose_long_numbers_as = string 18446744073709551615 -> '18446744073709551615'

Hi there,

https://github.com/ClickHouse/clickhouse-odbc/pull/356 - I see that in ODBC driver the same problem has been resolved with HugeIntAsString parameter. Though your solution provides more flexibity, it might be a good idea to align how drivers solve the issue. Or not... I'll be happy either way ;)

artshevchenko commented 2 years ago

Hi there!

To sum up:

  1. We need the expose_long_numbers_as=string parameter, since in most cases aggregates (like COUNTD) are considered for fields with large Integers or these fields are used as Dimensions (to specify the detail of visualizations). It is unlikely that anyone will sum up the values of such fields, knowing that the result most likely will not fit into the acceptable range of values.
  2. In theory, Tableau supports UInt64 (UI8 in dialect.tdd), but we can't use mapping since UInt64 is passed as java.lang.Long on the driver side, so the easiest way is to use the expose_long_numbers_as=string parameter when it is available
zhicwu commented 2 years ago

@artshevchenko & @nvm1, could you try clickhouse-jdbc-0.3.2-test3-http.jar? I'm about to release 0.3.2 tonight, which is almost same as 0.3.2-test3.

You can set connection property typeMappings to something like UInt64=java.lang.String(e.g. jdbc:ch://localhost:8123/default?typeMappings=UInt64%3Djava.lang.String), and then you'll get String instead of Long for all UInt64 values. My Tableau desktop was expired so I only tested on DBeaver: image

@yurifal suggested to add aliases like UI8 etc. but I think we probably no longer need that.

yurifal commented 2 years ago

@zhicwu wrote "... My Tableau desktop was expired... "

There's an old beta key (still valid as of 2021-12-31) TCJD-8646-8CA0-9845-1C12

FYI starting from October 2021, Tableau does no longer issue Desktop Creator keys. Customers have to activate their Desktops/Preps using their on-prem Servers with LBLM.

zhicwu commented 2 years ago

Thank you @yurifal. Just actived Tableau and ?typeMappings=UInt64%3Djava.lang.String gave me expected output :) image

yurifal commented 2 years ago

nice to see @zhicwu .

could you please share which version of the driver you're using with DBeawer for testing?

I've run the TDVT tests from Tableau Connector SDK and found a bunch of errors mainly with Dates / Datetimes (cast, operators & dateadd / datetrunc ones).

You can test for some of them manually (right in Tableau) by placing a Date / Datetime pill on a view as the Exact Date, or Week green one -- the calc is DATETRUNC('week', [your_date])

zhicwu commented 2 years ago

could you please share which version of the driver you're using with DBeawer for testing?

clickhouse-jdbc-0.3.2-test3-http.jar

I've run the TDVT tests from Tableau Connector SDK and found a bunch of errors mainly with Dates / Datetimes (cast, operators & dateadd / datetrunc ones).

Is there stack trace you can provide? To me, it seems working fine for DateTime but not Date. I had to map Date to String.

You can test for some of them manually (right in Tableau) by placing a Date / Datetime pill on a view as the Exact Date, or Week green one -- the calc is DATETRUNC('week', [your_date])

I added a calculation field using above expression against a DateTime field and it works.

zhicwu commented 2 years ago

I think the custom type mapping should address the issue - if not, please feel free to reopen the issue or create a new one.