Azure / azure-kusto-spark

Apache Spark Connector for Azure Kusto
Apache License 2.0
77 stars 34 forks source link

Importing the spark connector enables verbose logging #365

Closed lgo-solytic closed 5 months ago

lgo-solytic commented 7 months ago

We are running the spark connector on the Azure Kubernetes Service processing a delta stream. Logging to the console looks extremely verbose including what looks like delta internal logs.
We have tried changing the level log4j configuration but without success. Looking at the source code we have found the log4j.properties file and tried to override the log4j.rootLogger=INFO, KustoConnector and set it to Warn but without success. This behaviour significantly increases our costs for the Log Analytics workspace, and also causes performance problems for our integration tests. Is there any simple way to change this behaviour?

makism commented 6 months ago

We have recently stumbled upon the same issue. We are running a PoC with ADX, and we noticed the following between our PRD and DEV log instances.

after_adx

logs

It seems that every single line (literal) from the logs is sent logged to App Insights.

Any comments and thoughts are more than welcomed; please let me know if you need perhaps more details about our instance.

ag-ramachandran commented 6 months ago

Will have a look and see what needs to be done for a fix on this

ag-ramachandran commented 6 months ago

@makism , Do you use Databricks / Spark HDInsight (or) a different runtime ?

makism commented 6 months ago

@makism , Do you use Databricks / Spark HDInsight (or) a different runtime ?

hm, no, we are running our Spark <> ADX jobs on AKS

ag-ramachandran commented 6 months ago

@makism @lgo-solytic

Will get to replication and running this only later this week, Need time to set up a harness for a similar setup. The situation is unusual because most times , we have to ask colleagues using the connector to turn on verbose logging

In pyspark would this be something you can set, the example here is in INFO. Could you change it to WARN and try ?

    sc = self.spark.sparkContext
    sc._jvm.com.microsoft.kusto.spark.utils.KustoDataSourceUtils.setLoggingLevel("INFO")
lgo-solytic commented 6 months ago

@ag-ramachandran The problem was importing the connector library as a whole jar because of https://github.com/Azure/azure-kusto-spark/issues/338. After updating to the newest version and importing in the recommended way in build.sbt as: libraryDependencies ++= Seq( "com.microsoft.azure.kusto" %% "kusto-spark_3.0" % "5.0.6" ) the problem disappeared. Have tried adding com.microsoft.kusto.spark.utils.KustoDataSourceUtils.setLoggingLevel("WARN") ` in scala before with the jar file imported (latest version) but that did not help.
Our problem is solved. Thanks for your help.