dacort / metabase-athena-driver

An Amazon Athena driver for Metabase 0.32 and later
Apache License 2.0
225 stars 32 forks source link

The character 'ß' leads to a JDBC error when using it in a local SQL Query #106

Closed lukaswelsch closed 1 year ago

lukaswelsch commented 1 year ago

After upgrading the driver from v1.2.1 to v1.4.0 we experience problems with the character encoding for the character 'ß'.

This is a sample Query that creates the error:

SELECT * FROM (
    VALUES
        ('ß')
) AS t (name)

The error we are seeing is: [JDBC Driver]String index out of range: -2

From the troubleshooting - logs in the Metabase Admin settings I could observe this exception and it seems like an encoding problem.

:class java.lang.StringIndexOutOfBoundsException,
 :stacktrace
 ["java.base/java.lang.StringLatin1.charAt(Unknown Source)"
  "java.base/java.lang.String.charAt(Unknown Source)"

We have tried adding additional configurations, e.g. characterEncoding=utf-8. However, this did not solve the problem, probably because there is also no encoding-configuration mentioned in the SIMBA - Athena JDBC Guide.

Is there anything we could do to resolve this error?

dacort commented 1 year ago

Interesting - I wonder if this is due to something on the JDBC driver side or in my own code. I'm assuming the latter as otherwise would likely be a bigger issue for folks.

Will have to play around with this a bit. Does this happen when you're querying data with those encodings?

lukaswelsch commented 1 year ago

Thank you for looking into this issue!

I am able to query source data with the character 'ß'. It seems to only create an error, when the character 'ß' is used in a query.

Another example I have found is when the character is used in a comment then it also results in an error. Here is the example:

SELECT * 
--ß
FROM (
    VALUES
        ('a')
) AS t (name)

However, this error only occurs in non-trivial SQL-queries. For example, the error does not occur in a simple "SELECT .. FROM .. WHERE" statement. In queries with a "SELECT ... count(*) FROM ..." the error occurs.

dacort commented 1 year ago

What version of Metabase are you using? I just tried this with v0.43.1 and am not seeing the issue.

Granted, I get no results back at all and see a different error message in the console.

2022-09-02 22:12:40,475 ERROR middleware.catch-exceptions :: Error processing query: null
{:database_id 2,
 :started_at #t "2022-09-02T22:12:40.459123Z[GMT]",
 :json_query
 {:database 2,
  :native {:query "SELECT * FROM (\n    VALUES\n        ('ß')\n) AS t (name)", :template-tags {}},
  :type "native",
  :parameters [],
  :middleware {:js-int-to-string? true, :add-default-userland-constraints? true}},
 :status :failed,
 :class java.lang.NullPointerException,
 :stacktrace [nil],
 :card_id nil,
 :context :ad-hoc,
 :error nil,
 :row_count 0,
 :running_time 0,
 :data {:rows [], :cols []}}
lukaswelsch commented 1 year ago

We are also using Metabase Version v0.43.1.

I added the complete stack trace I am seeing in the Admin-console. From what I can observe, we don't get a NullPointerExcpetion, instead I see a "StringIndexOutOfBoundsException".

This is how the error is shown in Metabase: image

Stacktrace from Admin-console:

{:database_id 15, :started_at #t "2022-09-05T08:20:44.165730Z[GMT]", :via [{:status :failed, :class java.sql.SQLException, :error "[JDBC Driver]String index out of range: -2", :stacktrace ["java.base/java.lang.StringLatin1.charAt(Unknown Source)" "java.base/java.lang.String.charAt(Unknown Source)" "com.simba.athena.support.JDBCEscaper.Apply(Unknown Source)" "com.simba.athena.support.JDBCEscaper.Apply(Unknown Source)" "com.simba.athena.athena.dataengine.AJQueryExecutor.(Unknown Source)" "com.simba.athena.athena.dataengine.AJDataEngine.prepare(Unknown Source)" "com.simba.athena.athena.dataengine.AJDataEngine.prepare(Unknown Source)" "com.simba.athena.jdbc.common.SStatement.executeNoParams(Unknown Source)" "com.simba.athena.jdbc.common.BaseStatement.execute(Unknown Source)" "com.mchange.v2.c3p0.impl.NewProxyStatement.execute(NewProxyStatement.java:75)" "--> driver.sql_jdbc.execute$fn53975.invokeStatic(execute.clj:367)" "driver.sql_jdbc.execute$fn__53975.invoke(execute.clj:365)" "driver.sql_jdbc.execute$execute_statement_or_prepared_statementBANG.invokeStatic(execute.clj:375)" "driver.sql_jdbc.execute$execute_statement_or_prepared_statementBANG.invoke(execute.clj:372)" "driver.sql_jdbc.execute$execute_reducible_query$fn54055.invoke(execute.clj:500)" "driver.sql_jdbc.execute$execute_reducible_query.invokeStatic(execute.clj:499)" "driver.sql_jdbc.execute$execute_reducible_query.invoke(execute.clj:486)" "driver.sql_jdbc.execute$execute_reducible_query.invokeStatic(execute.clj:494)" "driver.sql_jdbc.execute$execute_reducible_query.invoke(execute.clj:486)" "driver.sql_jdbc$fn83551.invokeStatic(sql_jdbc.clj:54)" "driver.sql_jdbc$fn83551.invoke(sql_jdbc.clj:52)" "driver.athena$fn77399.invokeStatic(athena.clj:292)" "driver.athena$fn77399.invoke(athena.clj:290)" "query_processor.context$executef.invokeStatic(context.clj:59)" "query_processor.context$executef.invoke(context.clj:48)" "query_processor.context.default$default_runf.invokeStatic(default.clj:67)" "query_processor.context.default$default_runf.invoke(default.clj:65)" "query_processor.context$runf.invokeStatic(context.clj:45)" "query_processor.context$runf.invoke(context.clj:39)" "query_processor.reducible$identity_qp.invokeStatic(reducible.clj:12)" "query_processor.reducible$identity_qp.invoke(reducible.clj:9)" "query_processor.middleware.cache$maybe_return_cached_results$maybe_return_cached_results_STAR_51444.invoke(cache.clj:220)" "query_processor.middleware.permissions$check_query_permissions$fn47226.invoke(permissions.clj:109)" "query_processor.middleware.mbql_to_native$mbqlGT_native$fn__50385.invoke(mbql_to_native.clj:23)" "query_processor$fn52917$combined_post_process52922$combined_post_processSTAR52923.invoke(query_processor.clj:207)" "query_processor$fn52917$combined_pre_process52918$combined_pre_process_STAR_52919.invoke(query_processor.clj:204)" "query_processor.middleware.resolve_database_and_driver$resolve_database_and_driver$fn51365$fn51370.invoke(resolve_database_and_driver.clj:35)" "driver$do_with_driver.invokeStatic(driver.clj:75)" "driver$do_with_driver.invoke(driver.clj:71)" "query_processor.middleware.resolve_database_and_driver$resolve_database_and_driver$fn51365.invoke(resolve_database_and_driver.clj:34)" "query_processor.middleware.fetch_source_query$resolve_card_id_source_tables$fn47469.invoke(fetch_source_query.clj:281)" "query_processor.middleware.store$initialize_store$fn47657$fn47658.invoke(store.clj:11)" "query_processor.store$do_with_store.invokeStatic(store.clj:44)" "query_processor.store$do_with_store.invoke(store.clj:38)" "query_processor.middleware.store$initialize_store$fn47657.invoke(store.clj:10)" "query_processor.middleware.normalize_query$normalize$fn51637.invoke(normalize_query.clj:22)" "query_processor.middleware.constraints$add_default_userland_constraints$fn48727.invoke(constraints.clj:52)" "query_processor.middleware.process_userland_query$process_userland_query$fn51576.invoke(process_userland_query.clj:145)" "query_processor.middleware.catch_exceptions$catch_exceptions$fn__51948.invoke(catch_exceptions.clj:162)" "query_processor.reducible$async_qp$qpSTAR44196$thunk44198.invoke(reducible.clj:100)" "query_processor.reducible$async_qp$qpSTAR44196.invoke(reducible.clj:106)" "query_processor.reducible$async_qp$qp_STAR_44196.invoke(reducible.clj:91)" "query_processor.reducible$sync_qp$qp_STAR___44207.doInvoke(reducible.clj:126)" "query_processor$process_userland_query.invokeStatic(query_processor.clj:326)" "query_processor$process_userland_query.doInvoke(query_processor.clj:322)" "query_processor$fn52964$process_query_and_save_execution_BANG_52973$fn52976.invoke(query_processor.clj:337)" "query_processor$fn52964$process_query_and_save_execution_BANG_52973.invoke(query_processor.clj:330)" "query_processor$fn53008$process_query_and_save_with_max_results_constraintsBANG53017$fn53020.invoke(query_processor.clj:349)" "query_processor$fn53008$process_query_and_save_with_max_results_constraints_BANG_53017.invoke(query_processor.clj:342)" "api.dataset$run_query_async$fn66683.invoke(dataset.clj:68)" "query_processor.streaming$streaming_responseSTAR$fn39319$fn39320.invoke(streaming.clj:162)" "query_processor.streaming$streaming_responseSTAR$fn39319.invoke(streaming.clj:161)" "async.streaming_response$do_fSTAR.invokeStatic(streaming_response.clj:65)" "async.streaming_response$do_fSTAR.invoke(streaming_response.clj:63)" "async.streaming_response$do_f_async$task18948.invoke(streaming_response.clj:84)"], :state "HY000"} {:status :failed, :class clojure.lang.ExceptionInfo, :error "Error executing query: [JDBC Driver]String index out of range: -2", :stacktrace ["--> driver.sql_jdbc.execute$execute_reducible_query$fn54055.invoke(execute.clj:502)" "driver.sql_jdbc.execute$execute_reducible_query.invokeStatic(execute.clj:499)" "driver.sql_jdbc.execute$execute_reducible_query.invoke(execute.clj:486)" "driver.sql_jdbc.execute$execute_reducible_query.invokeStatic(execute.clj:494)" "driver.sql_jdbc.execute$execute_reducible_query.invoke(execute.clj:486)" "driver.sql_jdbc$fn83551.invokeStatic(sql_jdbc.clj:54)" "driver.sql_jdbc$fn83551.invoke(sql_jdbc.clj:52)" "driver.athena$fn77399.invokeStatic(athena.clj:292)" "driver.athena$fn77399.invoke(athena.clj:290)" "query_processor.context$executef.invokeStatic(context.clj:59)" "query_processor.context$executef.invoke(context.clj:48)" "query_processor.context.default$default_runf.invokeStatic(default.clj:67)" "query_processor.context.default$default_runf.invoke(default.clj:65)" "query_processor.context$runf.invokeStatic(context.clj:45)" "query_processor.context$runf.invoke(context.clj:39)" "query_processor.reducible$identity_qp.invokeStatic(reducible.clj:12)" "query_processor.reducible$identity_qp.invoke(reducible.clj:9)" "query_processor.middleware.cache$maybe_return_cached_results$maybe_return_cached_resultsSTAR51444.invoke(cache.clj:220)" "query_processor.middleware.permissions$check_query_permissions$fn47226.invoke(permissions.clj:109)" "query_processor.middleware.mbql_to_native$mbql__GT_native$fn50385.invoke(mbql_to_native.clj:23)" "query_processor$fn52917$combined_post_process52922$combined_post_process_STAR_52923.invoke(query_processor.clj:207)" "query_processor$fn52917$combined_pre_process52918$combined_pre_process_STAR_52919.invoke(query_processor.clj:204)" "query_processor.middleware.resolve_database_and_driver$resolve_database_and_driver$fn51365$fn51370.invoke(resolve_database_and_driver.clj:35)" "driver$do_with_driver.invokeStatic(driver.clj:75)" "driver$do_with_driver.invoke(driver.clj:71)" "query_processor.middleware.resolve_database_and_driver$resolve_database_and_driver$fn51365.invoke(resolve_database_and_driver.clj:34)" "query_processor.middleware.fetch_source_query$resolve_card_id_source_tables$fn47469.invoke(fetch_source_query.clj:281)" "query_processor.middleware.store$initialize_store$fn47657$fn47658.invoke(store.clj:11)" "query_processor.store$do_with_store.invokeStatic(store.clj:44)" "query_processor.store$do_with_store.invoke(store.clj:38)" "query_processor.middleware.store$initialize_store$fn47657.invoke(store.clj:10)" "query_processor.middleware.normalize_query$normalize$fn__51637.invoke(normalize_query.clj:22)" "query_processor.middleware.constraints$add_default_userland_constraints$fn48727.invoke(constraints.clj:52)" "query_processor.middleware.process_userland_query$process_userland_query$fn51576.invoke(process_userland_query.clj:145)" "query_processor.middleware.catch_exceptions$catch_exceptions$fn__51948.invoke(catch_exceptions.clj:162)" "query_processor.reducible$async_qp$qpSTAR44196$thunk44198.invoke(reducible.clj:100)" "query_processor.reducible$async_qp$qpSTAR44196.invoke(reducible.clj:106)" "query_processor.reducible$async_qp$qp_STAR_44196.invoke(reducible.clj:91)" "query_processor.reducible$sync_qp$qp_STAR___44207.doInvoke(reducible.clj:126)" "query_processor$process_userland_query.invokeStatic(query_processor.clj:326)" "query_processor$process_userland_query.doInvoke(query_processor.clj:322)" "query_processor$fn52964$process_query_and_save_execution_BANG_52973$fn52976.invoke(query_processor.clj:337)" "query_processor$fn52964$process_query_and_save_execution_BANG_52973.invoke(query_processor.clj:330)" "query_processor$fn53008$process_query_and_save_with_max_results_constraintsBANG53017$fn53020.invoke(query_processor.clj:349)" "query_processor$fn53008$process_query_and_save_with_max_results_constraints_BANG_53017.invoke(query_processor.clj:342)" "api.dataset$run_query_async$fn66683.invoke(dataset.clj:68)" "query_processor.streaming$streaming_responseSTAR$fn39319$fn39320.invoke(streaming.clj:162)" "query_processor.streaming$streaming_responseSTAR$fn39319.invoke(streaming.clj:161)" "async.streaming_response$do_fSTAR.invokeStatic(streaming_response.clj:65)" "async.streaming_response$do_fSTAR.invoke(streaming_response.clj:63)" "async.streaming_response$do_f_async$task__18948.invoke(streaming_response.clj:84)"], :error_type :invalid-query, :ex-data {:sql "-- Metabase:: userID: 453 queryType: native queryHash: 533dd952c5a8e96e3b56ddce7cb1c7e5780934fc4c642f48494268a52e6a4be3\nSELECT FROM (\n VALUES\n ('ß')\n) AS t (name)", :params nil, :type :invalid-query}}], :error_type :invalid-query, :json_query {:type "native", :native {:query "SELECT FROM (\n VALUES\n ('ß')\n) AS t (name)", :template-tags {}}, :database 15, :parameters [], :middleware {:js-int-to-string? true, :add-default-userland-constraints? true}}, :status :failed, :class java.lang.StringIndexOutOfBoundsException, :stacktrace ["java.base/java.lang.StringLatin1.charAt(Unknown Source)" "java.base/java.lang.String.charAt(Unknown Source)" "com.simba.athena.support.JDBCEscaper.Apply(Unknown Source)" "com.simba.athena.support.JDBCEscaper.Apply(Unknown Source)" "com.simba.athena.athena.dataengine.AJQueryExecutor.(Unknown Source)" "com.simba.athena.athena.dataengine.AJDataEngine.prepare(Unknown Source)" "com.simba.athena.athena.dataengine.AJDataEngine.prepare(Unknown Source)" "com.simba.athena.jdbc.common.SStatement.executeNoParams(Unknown Source)" "com.simba.athena.jdbc.common.BaseStatement.execute(Unknown Source)" "com.mchange.v2.c3p0.impl.NewProxyStatement.execute(NewProxyStatement.java:75)" "--> driver.sql_jdbc.execute$fn53975.invokeStatic(execute.clj:367)" "driver.sql_jdbc.execute$fn53975.invoke(execute.clj:365)" "driver.sql_jdbc.execute$execute_statement_or_prepared_statementBANG.invokeStatic(execute.clj:375)" "driver.sql_jdbc.execute$execute_statement_or_prepared_statementBANG.invoke(execute.clj:372)" "driver.sql_jdbc.execute$execute_reducible_query$fn__54055.invoke(execute.clj:500)" "driver.sql_jdbc.execute$execute_reducible_query.invokeStatic(execute.clj:499)" "driver.sql_jdbc.execute$execute_reducible_query.invoke(execute.clj:486)" "driver.sql_jdbc.execute$execute_reducible_query.invokeStatic(execute.clj:494)" "driver.sql_jdbc.execute$execute_reducible_query.invoke(execute.clj:486)" "driver.sql_jdbc$fn83551.invokeStatic(sql_jdbc.clj:54)" "driver.sql_jdbc$fn83551.invoke(sql_jdbc.clj:52)" "driver.athena$fn77399.invokeStatic(athena.clj:292)" "driver.athena$fn77399.invoke(athena.clj:290)" "query_processor.context$executef.invokeStatic(context.clj:59)" "query_processor.context$executef.invoke(context.clj:48)" "query_processor.context.default$default_runf.invokeStatic(default.clj:67)" "query_processor.context.default$default_runf.invoke(default.clj:65)" "query_processor.context$runf.invokeStatic(context.clj:45)" "query_processor.context$runf.invoke(context.clj:39)" "query_processor.reducible$identity_qp.invokeStatic(reducible.clj:12)" "query_processor.reducible$identity_qp.invoke(reducible.clj:9)" "query_processor.middleware.cache$maybe_return_cached_results$maybe_return_cached_resultsSTAR51444.invoke(cache.clj:220)" "query_processor.middleware.permissions$check_query_permissions$fn47226.invoke(permissions.clj:109)" "query_processor.middleware.mbql_to_native$mbql__GT_native$fn50385.invoke(mbql_to_native.clj:23)" "query_processor$fn52917$combined_post_process52922$combined_post_process_STAR_52923.invoke(query_processor.clj:207)" "query_processor$fn52917$combined_pre_process52918$combined_pre_process_STAR_52919.invoke(query_processor.clj:204)" "query_processor.middleware.resolve_database_and_driver$resolve_database_and_driver$fn51365$fn51370.invoke(resolve_database_and_driver.clj:35)" "driver$do_with_driver.invokeStatic(driver.clj:75)" "driver$do_with_driver.invoke(driver.clj:71)" "query_processor.middleware.resolve_database_and_driver$resolve_database_and_driver$fn51365.invoke(resolve_database_and_driver.clj:34)" "query_processor.middleware.fetch_source_query$resolve_card_id_source_tables$fn47469.invoke(fetch_source_query.clj:281)" "query_processor.middleware.store$initialize_store$fn47657$fn47658.invoke(store.clj:11)" "query_processor.store$do_with_store.invokeStatic(store.clj:44)" "query_processor.store$do_with_store.invoke(store.clj:38)" "query_processor.middleware.store$initialize_store$fn47657.invoke(store.clj:10)" "query_processor.middleware.normalize_query$normalize$fn__51637.invoke(normalize_query.clj:22)" "query_processor.middleware.constraints$add_default_userland_constraints$fn48727.invoke(constraints.clj:52)" "query_processor.middleware.process_userland_query$process_userland_query$fn51576.invoke(process_userland_query.clj:145)" "query_processor.middleware.catch_exceptions$catch_exceptions$fn__51948.invoke(catch_exceptions.clj:162)" "query_processor.reducible$async_qp$qpSTAR44196$thunk44198.invoke(reducible.clj:100)" "query_processor.reducible$async_qp$qpSTAR44196.invoke(reducible.clj:106)" "query_processor.reducible$async_qp$qp_STAR_44196.invoke(reducible.clj:91)" "query_processor.reducible$sync_qp$qp_STAR___44207.doInvoke(reducible.clj:126)" "query_processor$process_userland_query.invokeStatic(query_processor.clj:326)" "query_processor$process_userland_query.doInvoke(query_processor.clj:322)" "query_processor$fn52964$process_query_and_save_execution_BANG_52973$fn52976.invoke(query_processor.clj:337)" "query_processor$fn52964$process_query_and_save_execution_BANG_52973.invoke(query_processor.clj:330)" "query_processor$fn53008$process_query_and_save_with_max_results_constraintsBANG53017$fn53020.invoke(query_processor.clj:349)" "query_processor$fn53008$process_query_and_save_with_max_results_constraints_BANG_53017.invoke(query_processor.clj:342)" "api.dataset$run_query_async$fn66683.invoke(dataset.clj:68)" "query_processor.streaming$streaming_responseSTAR$fn39319$fn39320.invoke(streaming.clj:162)" "query_processor.streaming$streaming_responseSTAR$fn__39319.invoke(streaming.clj:161)" "async.streaming_response$do_fSTAR.invokeStatic(streaming_response.clj:65)" "async.streaming_response$do_fSTAR.invoke(streaming_response.clj:63)" "async.streaming_response$do_f_async$task__18948.invoke(streaming_response.clj:84)"], :card_id nil, :context :ad-hoc, :error "[JDBC Driver]String index out of range: -2", :row_count 0, :running_time 0, :data {:rows [], :cols []}}

dacort commented 1 year ago

OK, I was able to reproduce this using the same query - not sure what I was doing before.

I've also been able to reproduce this using an entirely different client (DBeaver), so it would seem it's related to either the driver itself (2.0.25) or system configuration. I think this is addressed in a newer version of the driver - will check with 2.0.32.

dacort commented 1 year ago

Using DBeaver and Athena JDBC driver version 2.0.32, I made a little progress.

This query works now:

SELECT 'ß(F';

But these two still don't

--ß
SELECT *
FROM (
    VALUES
        ('a')
) AS t (name);

SELECT * FROM (
    VALUES
        ('ß')
) AS t (name);

@lukaswelsch Based on your initial issue/sql, I'm guessing you have queries like the latter? I will open an issue with the Athena team, but I haven't seen anybody with this specific issue. If you have an account manager, it'd be good to reach out to them to help get this prioritized.

I've also noticed that with fairly standard SELECT <FIELDS> FROM <DATABASE> queries, this issue doesn't seem to happen.

lukaswelsch commented 1 year ago

Yes, we have stored queries in our system, like the ones I have provided. That is also how we found the issue.

I think that you have not yet seen anybody with this issue is related to few languages using the 'ß' character. As far as I know German is the only language that uses it.

Thanks again for looking into this bug and for creating the issue with the Athena Team! We will inform our account manager and hopefully this will get prioritized.

lukaswelsch commented 1 year ago

Hello @dacort , we got an answer from the AWS support. They found the bug and told us, that the error with the German symbols is expected to be resolved by the end of the month.

dacort commented 1 year ago

Awesome, good to hear! Thanks for following up @lukaswelsch.

dacort commented 1 year ago

There's a new version (2.0.34) that just got released that fixes both this issue and #118

Release notes here

dacort commented 1 year ago

Closing this issue in favor of official Metabase issue - Athena is now an officially supported Metabase driver as of 0.45.

https://github.com/metabase/metabase/issues/26975

@lukaswelsch Let me know if this is a blocker - I could probably make one last release of this driver with the updated version.