ClickHouse / spark-clickhouse-connector

Spark ClickHouse Connector build on DataSourceV2 API
https://clickhouse.com/docs/en/integrations/apache-spark
Apache License 2.0
176 stars 63 forks source link

username with dot: unable to query data (auth error might be ambigious) (?) #306

Closed paf91 closed 3 months ago

paf91 commented 3 months ago

When we user username with dot, like user.name, we recieve error while querying clickhouse, meanwhile first step use clickhouse; seem to work ClickHouse version: 23.9.1.1854 Spark version: 3.3.4 Clickhouse Connector: compiled master branch (0.8.0 version 2.12 scala)

spark = SparkSession.builder\
.config("spark.sql.catalog.clickhouse", "xenon.clickhouse.ClickHouseCatalog")\
.config("spark.sql.catalog.clickhouse.host", "private_server")\
.config("spark.sql.catalog.clickhouse.protocol", "https")\
.config("spark.port.maxRetries" "50")\
.config("spark.sql.catalog.clickhouse.http_port", "8443")\
.config("spark.sql.catalog.clickhouse.user", "user.name")\
.config("spark.sql.catalog.clickhouse.password", "somepassword")\
.config("spark.sql.catalog.clickhouse.option.ssl", "true")\
.config("spark.sql.catalog.clickhouse.option.sslmode", "strict")\
.config("spark.sql.catalog.clickhouse.option.sslrootcert", "/usr/local/share/ca-certificates/some_ca.crt")\
.getOrCreate()

This is where it strangely doesnt fail: spark.sql("use clickhouse;")

I am even able to look at structure:

test_table
DataFrame[test_column: string, testcol: string]

test_table.columns
['test_column', 'testcol']

This is where it actually fails:

test_table = spark.table(schema + "test_table")
test_table.select('test_column').take(5)
---------------------------------------------------------------------------
Py4JJavaError                             Traceback (most recent call last)
Cell In[20], line 1
----> 1 test_table.select('test_column').take(5)

File /opt/spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py:868, in DataFrame.take(self, num)
    858 def take(self, num: int) -> List[Row]:
    859     """Returns the first ``num`` rows as a :class:`list` of :class:`Row`.
    860 
    861     .. versionadded:: 1.3.0
   (...)
    866     [Row(age=2, name='Alice'), Row(age=5, name='Bob')]
    867     """
--> 868     return self.limit(num).collect()

File /opt/spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py:817, in DataFrame.collect(self)
    807 """Returns all the records as a list of :class:`Row`.
    808 
    809 .. versionadded:: 1.3.0
   (...)
    814 [Row(age=2, name='Alice'), Row(age=5, name='Bob')]
    815 """
    816 with SCCallSiteSync(self._sc):
--> 817     sock_info = self._jdf.collectToPython()
    818 return list(_load_from_socket(sock_info, BatchedSerializer(CPickleSerializer())))

File /opt/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py:1321, in JavaMember.__call__(self, *args)
   1315 command = proto.CALL_COMMAND_NAME +\
   1316     self.command_header +\
   1317     args_command +\
   1318     proto.END_COMMAND_PART
   1320 answer = self.gateway_client.send_command(command)
-> 1321 return_value = get_return_value(
   1322     answer, self.gateway_client, self.target_id, self.name)
   1324 for temp_arg in temp_args:
   1325     temp_arg._detach()

File /opt/spark/python/lib/pyspark.zip/pyspark/sql/utils.py:190, in capture_sql_exception.<locals>.deco(*a, **kw)
    188 def deco(*a: Any, **kw: Any) -> Any:
    189     try:
--> 190         return f(*a, **kw)
    191     except Py4JJavaError as e:
    192         converted = convert_exception(e.java_exception)

File /opt/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/protocol.py:326, in get_return_value(answer, gateway_client, target_id, name)
    324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
    325 if answer[1] == REFERENCE_TYPE:
--> 326     raise Py4JJavaError(
    327         "An error occurred while calling {0}{1}{2}.\n".
    328         format(target_id, ".", name), value)
    329 else:
    330     raise Py4JError(
    331         "An error occurred while calling {0}{1}{2}. Trace:\n{3}\n".
    332         format(target_id, ".", name, value))

Py4JJavaError: An error occurred while calling o127.collectToPython.
: xenon.clickhouse.exception.CHServerException: [HTTP]user.name@private_server:8443}/default [516] Code: 516. DB::Exception: user.name: Authentication failed: password is incorrect, or there is no user with such name. (AUTHENTICATION_FAILED) (version 23.9.1.1854 (official build))
, server ClickHouseNode [uri=https://private_server:8443/default, options={sslmode=strict,sslrootcert=/usr/local/share/ca-certificates/some_ca.crt}]@1787654854
    at xenon.clickhouse.client.NodeClient.syncQuery(NodeClient.scala:141)
    at xenon.clickhouse.client.NodeClient.syncQueryAndCheck(NodeClient.scala:151)
    at xenon.clickhouse.client.NodeClient.syncQueryAndCheckOutputJSONEachRow(NodeClient.scala:68)
    at xenon.clickhouse.ClickHouseHelper.queryPartitionSpec(ClickHouseHelper.scala:263)
    at xenon.clickhouse.ClickHouseHelper.queryPartitionSpec$(ClickHouseHelper.scala:259)
    at xenon.clickhouse.read.ClickHouseBatchScan.queryPartitionSpec(ClickHouseRead.scala:128)
    at xenon.clickhouse.read.ClickHouseBatchScan.$anonfun$inputPartitions$3(ClickHouseRead.scala:146)
    at xenon.clickhouse.Utils$.tryWithResource(Utils.scala:171)
    at xenon.clickhouse.read.ClickHouseBatchScan.$anonfun$inputPartitions$1(ClickHouseRead.scala:145)
    at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:293)
    at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
    at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
    at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
    at scala.collection.TraversableLike.flatMap(TraversableLike.scala:293)
    at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:290)
    at scala.collection.mutable.ArrayOps$ofRef.flatMap(ArrayOps.scala:198)
    at xenon.clickhouse.read.ClickHouseBatchScan.inputPartitions$lzycompute(ClickHouseRead.scala:144)
    at xenon.clickhouse.read.ClickHouseBatchScan.inputPartitions(ClickHouseRead.scala:142)
    at xenon.clickhouse.read.ClickHouseBatchScan.outputPartitioning(ClickHouseRead.scala:189)
    at org.apache.spark.sql.execution.datasources.v2.V2ScanPartitioning$$anonfun$apply$1.applyOrElse(V2ScanPartitioning.scala:42)
    at org.apache.spark.sql.execution.datasources.v2.V2ScanPartitioning$$anonfun$apply$1.applyOrElse(V2ScanPartitioning.scala:35)
    at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:584)
    at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:176)
    at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:584)
    at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
    at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
    at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
    at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$3(TreeNode.scala:589)
    at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1228)
    at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1227)
    at org.apache.spark.sql.catalyst.plans.logical.OrderPreservingUnaryNode.mapChildren(LogicalPlan.scala:208)
    at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:589)
    at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
    at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
    at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
    at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$3(TreeNode.scala:589)
    at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1228)
    at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1227)
    at org.apache.spark.sql.catalyst.plans.logical.OrderPreservingUnaryNode.mapChildren(LogicalPlan.scala:208)
    at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:589)
    at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
    at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
    at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
    at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$3(TreeNode.scala:589)
    at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1228)
    at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1227)
    at org.apache.spark.sql.catalyst.plans.logical.GlobalLimit.mapChildren(basicLogicalOperators.scala:1258)
    at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:589)
    at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
    at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
    at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
    at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:560)
    at org.apache.spark.sql.execution.datasources.v2.V2ScanPartitioning$.apply(V2ScanPartitioning.scala:35)
    at org.apache.spark.sql.execution.datasources.v2.V2ScanPartitioning$.apply(V2ScanPartitioning.scala:34)
    at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:211)
    at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
    at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
    at scala.collection.immutable.List.foldLeft(List.scala:91)
    at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:208)
    at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:200)
    at scala.collection.immutable.List.foreach(List.scala:431)
    at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:200)
    at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:179)
    at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:88)
    at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:179)
    at org.apache.spark.sql.execution.QueryExecution.$anonfun$optimizedPlan$1(QueryExecution.scala:126)
    at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
    at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:185)
    at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:510)
    at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:185)
    at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
    at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:184)
    at org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:122)
    at org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:118)
    at org.apache.spark.sql.execution.QueryExecution.assertOptimized(QueryExecution.scala:136)
    at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:154)
    at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:151)
    at org.apache.spark.sql.execution.QueryExecution.simpleString(QueryExecution.scala:204)
    at org.apache.spark.sql.execution.QueryExecution.org$apache$spark$sql$execution$QueryExecution$$explainString(QueryExecution.scala:249)
    at org.apache.spark.sql.execution.QueryExecution.explainString(QueryExecution.scala:218)
    at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:103)
    at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:169)
    at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:95)
    at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
    at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
    at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3896)
    at org.apache.spark.sql.Dataset.collectToPython(Dataset.scala:3725)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
    at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
    at java.lang.Thread.run(Thread.java:750)
Caused by: com.clickhouse.client.ClickHouseException: Code: 516. DB::Exception: user.name: Authentication failed: password is incorrect, or there is no user with such name. (AUTHENTICATION_FAILED) (version 23.9.1.1854 (official build))
, server ClickHouseNode [uri=https://private_server:8443/default, options={sslmode=strict,sslrootcert=/usr/local/share/ca-certificates/some_ca.crt}]@1787654854
    at com.clickhouse.client.ClickHouseException.of(ClickHouseException.java:169)
    at com.clickhouse.client.AbstractClient.lambda$execute$0(AbstractClient.java:275)
    at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    ... 1 more
Caused by: java.io.IOException: Code: 516. DB::Exception: user.name: Authentication failed: password is incorrect, or there is no user with such name. (AUTHENTICATION_FAILED) (version 23.9.1.1854 (official build))

    at com.clickhouse.client.http.ApacheHttpConnectionImpl.checkResponse(ApacheHttpConnectionImpl.java:220)
    at com.clickhouse.client.http.ApacheHttpConnectionImpl.post(ApacheHttpConnectionImpl.java:254)
    at com.clickhouse.client.http.ClickHouseHttpClient.send(ClickHouseHttpClient.java:118)
    at com.clickhouse.client.AbstractClient.sendAsync(AbstractClient.java:161)
    at com.clickhouse.client.AbstractClient.lambda$execute$0(AbstractClient.java:273)
    ... 4 more
paf91 commented 3 months ago

Looks like issue was that accounts with dots were not created on ALL nodes. That was the culprit