Azure / azure-kusto-spark

Apache Spark Connector for Azure Kusto
Apache License 2.0
77 stars 34 forks source link

Spark write to Synapse error: java.lang.NoClassDefFoundError: com/twitter/util/TimeoutException #372

Closed aayushsin closed 2 months ago

aayushsin commented 5 months ago

Describe the bug Recently with Spark 3.4, we are getting the following error: Fail to write to Kusto (Azure Data Explore), on Spark Stream. Throwing the following exception. It was working with Spark 3.3 but after upgrde to 3.4 and other library upgrades, it is giving the following error

java.lang.NoClassDefFoundError: com/twitter/util/TimeoutException

To Reproduce df.write \ .format("com.microsoft.kusto.spark.synapse.datasource") \ .option("spark.synapse.linkedService", ) \ .option("kustoDatabase", ) \ .option("kustoTable",

) \ .option("tableCreateOptions","CreateIfNotExist") \ .mode("Append") \ .save()

Expected behavior The data should be written to database

Error

File /opt/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py:1396, in DataFrameWriter.save(self, path, format, mode, partitionBy, **options)
   1394     self.format(format)
   1395 if path is None:
-> 1396     self._jwrite.save()
   1397 else:
   1398     self._jwrite.save(path)

File ~/cluster-env/clonedenv/lib/python3.10/site-packages/py4j/java_gateway.py:1322, in JavaMember.__call__(self, *args)
   1316 command = proto.CALL_COMMAND_NAME +\
   1317     self.command_header +\
   1318     args_command +\
   1319     proto.END_COMMAND_PART
   1321 answer = self.gateway_client.send_command(command)
-> 1322 return_value = get_return_value(
   1323     answer, self.gateway_client, self.target_id, self.name)
   1325 for temp_arg in temp_args:
   1326     if hasattr(temp_arg, "_detach"):

File /opt/spark/python/lib/pyspark.zip/pyspark/errors/exceptions/captured.py:169, in capture_sql_exception.<locals>.deco(*a, **kw)
    167 def deco(*a: Any, **kw: Any) -> Any:
    168     try:
--> 169         return f(*a, **kw)
    170     except Py4JJavaError as e:
    171         converted = convert_exception(e.java_exception)

File ~/cluster-env/clonedenv/lib/python3.10/site-packages/py4j/protocol.py:326, in get_return_value(answer, gateway_client, target_id, name)
    324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
    325 if answer[1] == REFERENCE_TYPE:
--> 326     raise Py4JJavaError(
    327         "An error occurred while calling {0}{1}{2}.\n".
    328         format(target_id, ".", name), value)
    329 else:
    330     raise Py4JError(
    331         "An error occurred while calling {0}{1}{2}. Trace:\n{3}\n".
    332         format(target_id, ".", name, value))

Py4JJavaError: An error occurred while calling o35663.save.
: java.lang.NoClassDefFoundError: com/twitter/util/TimeoutException
    at com.microsoft.kusto.spark.synapse.utils.LSUtils.getLSCluster(SynapseLSRUtils.scala:57)
    at com.microsoft.kusto.spark.synapse.utils.LSUtils.getLSCluster$(SynapseLSRUtils.scala:55)
    at com.microsoft.kusto.spark.synapse.utils.SynapseLSUtils$.getLSCluster(SynapseLSRUtils.scala:68)
    at com.microsoft.kusto.spark.synapse.utils.KustoLSDataSourceUtils.$anonfun$convertLinkedServiceToKustoParameters$3(KustoSynapseDataSourceUtils.scala:116)
    at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
    at com.microsoft.kusto.spark.synapse.utils.ComponentEventPublisherEx$.$anonfun$publishComponentEventFor$1(AppEventPublisher.scala:66)
    at scala.util.Try$.apply(Try.scala:213)
    at com.microsoft.spark.utils.CommonUtils$.executeFunction(CommonUtils.scala:55)
    at com.microsoft.spark.utils.CommonUtils$.getBlockTimeAndResult(CommonUtils.scala:36)
    at com.microsoft.kusto.spark.synapse.utils.ComponentEventPublisherEx$.publishComponentEventFor(AppEventPublisher.scala:65)
    at com.microsoft.kusto.spark.synapse.utils.EventPublisher.publishComponentEventFor(AppEventPublisher.scala:30)
    at com.microsoft.kusto.spark.synapse.utils.KustoLSDataSourceUtils.convertLinkedServiceToKustoParameters(KustoSynapseDataSourceUtils.scala:111)
    at com.microsoft.kusto.spark.synapse.utils.KustoLSDataSourceUtils.convertLinkedServiceToKustoParameters$(KustoSynapseDataSourceUtils.scala:84)
    at com.microsoft.kusto.spark.synapse.utils.KustoSynapseLSDataSourceUtils$.convertLinkedServiceToKustoParameters(KustoSynapseDataSourceUtils.scala:170)
    at com.microsoft.kusto.spark.synapse.datasource.BaseDefaultSource.$anonfun$createRelation$1(DefaultSource.scala:23)
    at com.microsoft.kusto.spark.synapse.utils.ComponentEventPublisherEx$.$anonfun$publishComponentEventFor$1(AppEventPublisher.scala:66)
    at scala.util.Try$.apply(Try.scala:213)
    at com.microsoft.spark.utils.CommonUtils$.executeFunction(CommonUtils.scala:55)
    at com.microsoft.spark.utils.CommonUtils$.getBlockTimeAndResult(CommonUtils.scala:36)
    at com.microsoft.kusto.spark.synapse.utils.ComponentEventPublisherEx$.publishComponentEventFor(AppEventPublisher.scala:65)
    at com.microsoft.kusto.spark.synapse.utils.EventPublisher.publishComponentEventFor(AppEventPublisher.scala:30)
    at com.microsoft.kusto.spark.synapse.datasource.BaseDefaultSource.createRelation(DefaultSource.scala:20)
    at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:47)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84)
    at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:152)
    at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:120)
    at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:209)
    at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:105)
    at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
    at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:67)
    at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:152)
    at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:145)
    at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:512)
    at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:104)
    at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:512)
    at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:32)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
    at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32)
    at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32)
    at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:488)
    at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:145)
    at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:129)
    at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:123)
    at org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:200)
    at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:897)
    at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:412)
    at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:379)
    at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:249)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:566)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.base/java.lang.Thread.run(Thread.java:829)

pip list

Package                      Version
---------------------------- ------------------
absl-py                      2.1.0
adal                         1.2.7
adlfs                        2023.10.0
aiohttp                      3.9.3
aiosignal                    1.3.1
anyio                        3.7.1
applicationinsights          0.11.10
argcomplete                  3.2.3
argon2-cffi                  23.1.0
argon2-cffi-bindings         21.2.0
arrow                        1.3.0
asttokens                    2.4.1
astunparse                   1.6.3
async-timeout                4.0.3
attrs                        23.2.0
azure-ai-ml                  1.12.1
azure-common                 1.1.28
azure-core                   1.30.1
azure-data-tables            12.5.0
azure-datalake-store         0.0.51
azure-graphrbac              0.61.1
azure-identity               1.15.0
azure-keyvault-secrets       4.8.0
azure-kusto-data             4.3.1
azure-mgmt-authorization     4.0.0
azure-mgmt-containerregistry 10.3.0
azure-mgmt-core              1.4.0
azure-mgmt-keyvault          10.3.0
azure-mgmt-network           25.2.0
azure-mgmt-resource          23.0.1
azure-mgmt-storage           21.1.0
azure-storage-blob           12.19.0
azure-storage-file-datalake  12.14.0
azure-storage-file-share     12.15.0
azure-synapse-ml-predict     1.0.0
azureml-core                 1.55.0
azureml-dataprep             5.1.6
azureml-dataprep-native      41.0.0
azureml-dataprep-rslex       2.22.2
azureml-dataset-runtime      1.55.0
azureml-mlflow               1.55.0
azureml-opendatasets         1.55.0
azureml-synapse              0.0.1
azureml-telemetry            1.55.0
backcall                     0.2.0
backports.tempfile           1.0
backports.weakref            1.0.post1
bcrypt                       4.1.2
beautifulsoup4               4.12.2
bleach                       6.1.0
blinker                      1.7.0
Brotli                       1.1.0
cached-property              1.5.2
cachetools                   5.3.3
certifi                      2024.2.2
cffi                         1.16.0
charset-normalizer           3.3.2
click                        8.1.7
cloudpickle                  2.2.1
clr-loader                   0.2.6
colorama                     0.4.6
comm                         0.2.2
conda-package-handling       2.2.0
conda_package_streaming      0.9.0
configparser                 6.0.1
contextlib2                  21.6.0
contourpy                    1.2.0
control-script               1.0.3
cryptography                 41.0.7
cycler                       0.12.1
dash                         2.16.1
dash-core-components         2.0.0
dash-cytoscape               0.2.0
dash-html-components         2.0.0
dash-table                   5.0.0
databricks-cli               0.18.0
debugpy                      1.8.1
decorator                    5.1.1
defusedxml                   0.7.1
dill                         0.3.8
distlib                      0.3.8
docker                       7.0.0
entrypoints                  0.4
et-xmlfile                   1.1.0
exceptiongroup               1.2.0
executing                    2.0.1
fastjsonschema               2.19.1
filelock                     3.13.1
Flask                        3.0.2
flatbuffers                  24.3.7
fluent-logger                0.10.0
fonttools                    4.49.0
fqdn                         1.5.1
frozenlist                   1.4.1
fsspec                       2024.2.0
fsspec-wrapper               0.1.13
fusepy                       3.0.1
gast                         0.5.4
geographiclib                2.0
geopy                        2.4.1
gevent                       23.9.0.post1
gitdb                        4.0.11
GitPython                    3.1.42
gmpy2                        2.1.2
google-api-core              2.17.1
google-auth                  2.28.2
google-auth-oauthlib         1.2.0
google-pasta                 0.2.0
googleapis-common-protos     1.63.0
greenlet                     3.0.3
grpcio                       1.59.3
h5py                         3.10.0
html5lib                     1.1
humanfriendly                10.0
idna                         3.6
ijson                        3.2.3
imageio                      2.33.1
importlib_metadata           7.0.2
importlib_resources          6.3.0
impulse-python-handler       1.0.19.1.0.0
interpret                    0.5.0
interpret-core               0.5.0
ipykernel                    6.29.3
ipython                      8.14.0
ipywidgets                   8.0.7
isodate                      0.6.1
isoduration                  20.11.0
itsdangerous                 2.1.2
jedi                         0.19.1
jeepney                      0.8.0
Jinja2                       3.1.3
jmespath                     1.0.1
joblib                       1.3.2
jsonpickle                   3.0.3
jsonpointer                  2.4
jsonschema                   4.21.1
jsonschema-specifications    2023.12.1
jupyter_client               8.6.1
jupyter_core                 5.7.2
jupyter-events               0.9.1
jupyter_server               2.7.0
jupyter_server_terminals     0.5.3
jupyter-ui-poll              0.2.2
jupyterlab_pygments          0.3.0
jupyterlab_widgets           3.0.10
keras                        2.15.0
kiwisolver                   1.4.5
knack                        0.11.0
liac-arff                    2.5.0
library-metadata-cooker      0.0.7
lightgbm                     4.2.0
llvmlite                     0.42.0
lxml                         5.1.0
Markdown                     3.5.1
MarkupSafe                   2.1.5
marshmallow                  3.21.1
matplotlib                   3.8.2
matplotlib-inline            0.1.6
mistune                      3.0.2
mkl_fft                      1.3.8
mkl_random                   1.2.5
mkl-service                  2.4.1
ml-dtypes                    0.2.0
mlflow-skinny                2.9.2
mltable                      1.6.1
mpmath                       1.3.0
msal                         1.27.0
msal-extensions              1.1.0
msgpack                      1.0.8
msrest                       0.7.1
msrestazure                  0.6.4
multidict                    6.0.5
multiprocess                 0.70.16
munkres                      1.1.4
mypy                         1.4.1
mypy-extensions              1.0.0
nbclient                     0.10.0
nbconvert                    7.16.2
nbformat                     5.10.2
ndg-httpsclient              0.5.1
nest_asyncio                 1.6.0
networkx                     3.2.1
notebookutils                3.4.1-20240309.2
numba                        0.59.0
numpy                        1.23.5
oauthlib                     3.2.2
onnx                         1.15.0
opencensus                   0.11.4
opencensus-context           0.1.3
opencensus-ext-azure         1.1.13
openpyxl                     3.1.2
opt-einsum                   3.3.0
overrides                    7.7.0
packaging                    23.2
pandas                       1.5.3
pandasql                     0.7.3
pandocfilters                1.5.0
paramiko                     3.4.0
parso                        0.8.3
pathos                       0.3.2
pathspec                     0.12.1
patsy                        0.5.6
pexpect                      4.9.0
pickleshare                  0.7.5
pillow                       10.2.0
pip                          23.1.2
pkginfo                      1.10.0
pkgutil_resolve_name         1.3.10
platformdirs                 3.11.0
plotly                       5.18.0
ply                          3.11
portalocker                  2.8.2
powerbiclient                3.1.1
pox                          0.3.4
ppft                         1.7.6.8
prettytable                  3.9.0
prometheus_client            0.20.0
prompt-toolkit               3.0.42
protobuf                     4.24.4
psutil                       5.9.8
ptyprocess                   0.7.0
pure-eval                    0.2.2
py4j                         0.10.9.7
pyarrow                      14.0.2
pyasn1                       0.5.1
pyasn1-modules               0.3.0
pycairo                      1.26.0
pycosat                      0.6.6
pycparser                    2.21
pydash                       7.0.5
Pygments                     2.17.2
PyGObject                    3.48.1
PyJWT                        2.8.0
PyNaCl                       1.5.0
pyodbc                       5.0.1
pyOpenSSL                    23.2.0
pyparsing                    3.1.2
pyperclip                    1.8.2
PyQt5                        5.15.9
PyQt5-sip                    12.12.2
PySocks                      1.7.1
pyspark                      3.4.1.5.3.20230713
python-dateutil              2.9.0
python-json-logger           2.0.7
pythonnet                    3.0.3
pytz                         2023.4
pyu2f                        0.1.5
PyYAML                       6.0.1
pyzmq                        25.1.2
referencing                  0.33.0
regex                        2023.12.25
requests                     2.31.0
requests-oauthlib            1.4.0
retrying                     1.3.3
rfc3339-validator            0.1.4
rfc3986-validator            0.1.1
rpds-py                      0.18.0
rsa                          4.9
ruamel.yaml                  0.18.5
ruamel.yaml.clib             0.2.7
ruamel-yaml-conda            0.15.80
SALib                        1.4.8
scikit-learn                 1.3.2
scipy                        1.11.4
seaborn                      0.13.1
SecretStorage                3.3.3
Send2Trash                   1.8.2
setuptools                   69.2.0
shap                         0.44.0
sip                          6.7.12
six                          1.16.0
slicer                       0.0.7
smmap                        5.0.0
sniffio                      1.3.1
soupsieve                    2.5
SQLAlchemy                   2.0.28
sqlanalyticsconnectorpy      1.0.1
sqlparse                     0.4.4
stack-data                   0.6.2
statsmodels                  0.14.1
strictyaml                   1.7.3
sympy                        1.12
synapseml-cognitive          1.0.2
synapseml-core               1.0.2
synapseml-deep-learning      1.0.2
synapseml-internal           1.0.2.1.dev1
synapseml-lightgbm           1.0.2
synapseml-opencv             1.0.2
synapseml-vw                 1.0.2
tabulate                     0.9.0
tenacity                     8.2.3
tensorboard                  2.15.2
tensorboard-data-server      0.7.0
tensorflow                   2.15.0
tensorflow_estimator         2.15.0
termcolor                    2.4.0
terminado                    0.18.1
threadpoolctl                3.3.0
tinycss2                     1.2.1
toml                         0.10.2
tomli                        2.0.1
toolz                        0.12.1
torch                        2.0.1
tornado                      6.4
tqdm                         4.66.2
traitlets                    5.14.2
typed-ast                    1.5.5
types-python-dateutil        2.8.19.20240311
typing_extensions            4.10.0
typing-utils                 0.1.0
unicodedata2                 15.1.0
uri-template                 1.3.0
urllib3                      2.1.0
virtualenv                   20.23.1
wcwidth                      0.2.13
webcolors                    1.13
webencodings                 0.5.1
websocket-client             1.7.0
Werkzeug                     3.0.1
wheel                        0.42.0
widgetsnbextension           4.0.10
wrapt                        1.14.1
xgboost                      2.0.3
yarl                         1.9.4
zipp                         3.17.0
zope.event                   5.0
zope.interface               6.2
zstandard                    0.22.0
Note: you may need to restart the kernel to use updated packages.

Additional context Add any other context about the problem here.

ag-ramachandran commented 5 months ago

@aayushsin When you "upgraded" to spark 3.4. Did you save the linked service and publish your workspace ? The connector revisions have not changed between versions in Synapse, so it is most likely an issue in the linked service not being published.

aayushsin commented 5 months ago

I was using an existing Synapse Linked Service. The issue is not coming up with Spark 3.3 but only with spark 3.4 I am guessing since Spark 3.4 is in preview state in Synpase. This issue is there. @ag-ramachandran Can you please check the integration with Spark 3.4 with an existsing Linked service.

ag-ramachandran commented 5 months ago

While I try to replicate @aayushsin , please delete that linked service, create one again, publish the workspace and run the notebook again

ag-ramachandran commented 5 months ago

@aayushsin , I can replicate. While i get a fix for the preview, you can use the following

%%spark
val creds = mssparkutils.credentials.getConnectionStringOrCreds("sdktestcluster")
df.write
.format("com.microsoft.kusto.spark.synapse.datasource")
.option("accessToken", creds)
.option("kustoCluster", <kustoCluster>l)
.option("kustoDatabase", )
.option("kustoTable", )
.option("tableCreateOptions","CreateIfNotExist")
.mode("Append")
.save()
ag-ramachandran commented 5 months ago

Let me know once you try the above @aayushsin , Will try and catch the 3.4 GA release in synapse with this

ag-ramachandran commented 5 months ago

Hi @aayushsin , A new release has been created fixing this issue in the connector. This will get rolled out in the coming weeks with the Synapse release schedule

Krumelur commented 4 months ago

I'm still getting the error. Has the fix already been rolled out?

ag-ramachandran commented 4 months ago

Hello @Krumelur , Unfortunately not. There has been an issue with this rollout unfortunately. Give us time till next week for this , will get a rollout done for this. In the meanwhile, the workaround is the following

%%spark
val creds = mssparkutils.credentials.getConnectionStringOrCreds("sdktestcluster")
df.write
.format("com.microsoft.kusto.spark.synapse.datasource")
.option("accessToken", creds)
.option("kustoCluster", <kustoCluster>l)
.option("kustoDatabase", )
.option("kustoTable", )
.option("tableCreateOptions","CreateIfNotExist")
.mode("Append")
.save()
Krumelur commented 4 months ago

Thanks. For others who might encounter the issue, here's a Python version of the workaround for reading Kusto data into a dataframe:

def createDataFrameFromKustoLinkedService(linked_service_name : str, query_string : str, database : str) -> any:
    # This should use the linked service directly but fails.
    # There's a bug that' supposed to be fixed already or getting a fix soon.
    # See: https://github.com/Azure/azure-kusto-spark/issues/372
    #
    # For now, workaround below.
    #
    # kustoDf  = spark.read \
    #     .format("com.microsoft.kusto.spark.synapse.datasource") \
    #     .option("spark.synapse.linkedService", linked_service_name) \
    #     .option("kustoDatabase", database) \
    #     .option("kustoQuery", query_string) \
    #     .load()

    import json        

    ls_props = json.loads(mssparkutils.credentials.getPropertiesAll(linked_service_name))

    clusterUrl : str = ls_props["Endpoint"]
    creds = ls_props["AuthKey"]

    kustoDf  = spark.read \
        .format("com.microsoft.kusto.spark.synapse.datasource") \
        .option("accessToken", creds) \
        .option("kustoCluster", clusterUrl) \
        .option("kustoDatabase", database) \
        .option("kustoQuery", query_string) \
        .load()

    return kustoDf
xuewang commented 3 months ago

@ag-ramachandran could you provide an update on the fix deployment? We're also looking for upgrading our Spark pool to 3.4 but currently blocked by the Kusto connector timeout failure.

mmaitre314 commented 2 months ago

Is there a different workaround for linked services using managed identities? Hitting this error when calling getConnectionStringOrCreds()

Py4JJavaError: An error occurred while calling z:mssparkutils.credentials.getConnectionStringOrCreds. : 
    com.microsoft.azure.synapse.tokenlibrary.TokenServiceClientResponseStatusException: Token Service returned 'Client Error' (400), with message: 
    {"result":"DependencyError","errorId":"BadRequest","errorMessage":"[Code=CredentialTypeNotSupported, Target=kusto_sips_insights, Message=Failed to load LinkedService, Exception: Credential: default is of type UAMI and is not supported]. TraceId : 5de4fe7d-b528-48bf-8828-a147a8c83d18 | client-request-id : 3f08a5cd-4427-4ce0-a25d-333001584c44. Error Component : LSR"}
    at com.microsoft.azure.synapse.tokenlibrary.TokenServiceClient.invokePostApi(TokenServiceClient.scala:139)
    at com.microsoft.azure.synapse.tokenlibrary.TokenServiceClient.callLinkedServiceApi(TokenServiceClient.scala:168)
    at com.microsoft.azure.synapse.tokenlibrary.TokenLibraryInternal.tokenServiceCall$1(TokenLibrary.scala:112)
ag-ramachandran commented 2 months ago

Hi @mmaitre314 ,

Is it UserManagedIdentity or SystemManagedIdentity.

If it is UMI, it is not supported on Synapse platform (not a connector issue).

P.S. The fix for this is already made, waiting for the rollout of this to happen

mmaitre314 commented 2 months ago

I used an alternative workaround to get the access token. In case that helps others:

(
    spark.createDataFrame([('a', 1), ('b', 2), ('c', 3)], ['col1', 'col2'])
    .write
    .format("com.microsoft.kusto.spark.synapse.datasource")
    .option("accessToken", mssparkutils.credentials.getToken("AzureDataExplorer"))
    .option("kustoCluster", "https://<cluster>.<region>.kusto.windows.net")
    .option("kustoDatabase", "<database>")
    .option("kustoTable", "<table>")
    .option("tableCreateOptions","CreateIfNotExist")
    .mode("Append")
    .save()
)
ag-ramachandran commented 2 months ago

This is fixed now in 3.4 and will work with linked services as-is

Writes : image

Reads in single mode :

image

Reads in distributed mode:

image

ag-ramachandran commented 2 months ago

Creating a new spark pool should upgrade to the latest lib and work

jburb commented 2 months ago

@ag-ramachandran, what is the path to get this fix if we already upgraded pools to 3.4 and implemented the earlier suggested workaround?

ag-ramachandran commented 2 months ago

@jburb , the connector has been rolled out in the latest synapse version.

However, the details of how this makes it to an existing pool is probably pertinent to a Synapse pool and we may have to perhaps ask the Synapse team

If your pools have not been restarted, could you try a restart (or) Could you create a new pool to try and see if that works [you'd still be able to use all artifacts from your workspace and parts like MI even if a new pool is created]