Azure / azure-kusto-spark

Apache Spark Connector for Azure Kusto
Apache License 2.0
77 stars 35 forks source link

Count query fails if a large query ends in a comment #267

Open oglehb opened 1 year ago

oglehb commented 1 year ago

Describe the bug When the final line of a Kusto query ends with a comment, the count query fails if the tabular expression result contains more than truncationmaxrecords rows (500000 by default).

To Reproduce

conn = {
    "kustoCluster": xxx,
    "kustoDatabase": xxx,
    "kustoAadAppId": xxx,
    "kustoAadAppSecret": xxx,
    "kustoAadAuthorityID": xxx,
}
crp = '{"Options": {"truncationmaxrecords": 500000}, "Parameters": {}}'
kql = 'range _ from 1 to 500001 step 1 // this comment causes issue'

df = (spark.read
        .format("com.microsoft.kusto.spark.synapse.datasource")
        .options(**conn)
        .option("kustoQuery", kql)
        .option("clientRequestPropertiesJson", crp)
        .load())

df.limit(0)

Expected behavior All else the same, I expected the query range _ from 1 to 500001 step 1 // this comment causes issue to behave identically to the query range _ from 1 to 500001 step 1 (when truncationmaxrecords is 500000 or below in this case).

Screenshots If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

Additional context Maybe it has something to do with this code: https://github.com/Azure/azure-kusto-spark/blob/83cf263a7e56365a52f895cf774fb24b72e9e0fc/connector/src/main/scala/com/microsoft/kusto/spark/utils/KustoQueryUtils.scala#L11-L13 Perhaps changing would solve the issue:

def limitQuery(query: String, limit: Int): String = {
    val trimmedQuery = query.trim
    trimmedQuery + s"\n| take $limit"
}

(I do not know Scala)

ag-ramachandran commented 1 year ago

Hello @oglehb , Will have a look at this right away and potentially fixing this issue