ClickHouse / spark-clickhouse-connector

Spark ClickHouse Connector build on DataSourceV2 API
https://clickhouse.com/docs/en/integrations/apache-spark
Apache License 2.0
187 stars 66 forks source link

NodeClient initialization fails intermittently when attempting to pull get scala version #358

Closed mikech-goodcover closed 3 weeks ago

mikech-goodcover commented 2 months ago

Describe the bug

NodeClient initialization intermittently fails when trying to pull package metadata.

Seems to be attempting to grab the scala version from the package version, and getting it in an unexpected format.

Introduced here: https://github.com/ClickHouse/spark-clickhouse-connector/commit/6dd82f18f77166af50c79faa0c83f90255817922

Expected behaviour

Doesn't blow up.

Error log

2024-09-06T06:47:24.942658739Z java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds for length 1
2024-09-06T06:47:24.942665879Z  at com.clickhouse.spark.client.NodeClient.userAgent$lzycompute(NodeClient.scala:51)
2024-09-06T06:47:24.942667819Z  at com.clickhouse.spark.client.NodeClient.userAgent(NodeClient.scala:45)
2024-09-06T06:47:24.942671599Z  at com.clickhouse.spark.client.NodeClient.<init>(NodeClient.scala:70)
2024-09-06T06:47:24.942675009Z  at com.clickhouse.spark.client.NodeClient$.apply(NodeClient.scala:38)
2024-09-06T06:47:24.942681709Z  at com.clickhouse.spark.ClickHouseCatalog.initialize(ClickHouseCatalog.scala:77)
...

Environment

ClickHouse server

mzitnik commented 2 months ago

Hi @mikech-goodcover Thanks for the feedback looking into it

mzitnik commented 2 months ago

I was testing with our example code, and it works fine. Can you describe a way to reproduce?

dispalt commented 2 months ago

So I am guessing this has something to do with shading. It would be nice if we could wrap the whole thing in a try/catch and fall back to using the generic string

mzitnik commented 2 months ago

I planned to do this, but I would like to reproduce it. Can you describe how you are running it?