YotpoLtd / metorikku

A simplified, lightweight ETL Framework based on Apache Spark
https://yotpoltd.github.io/metorikku/
MIT License
583 stars 155 forks source link

How to use Apache Spark Connector? #473

Closed Rap70r closed 2 years ago

Rap70r commented 2 years ago

Hello,

Is it possible to use Apache Spark Connector for SQL Server with metorikku?

https://github.com/microsoft/sql-spark-connector

I'm trying to create a custom UDF that I can pass a dataframe to it and load the dataframe into a sql server table. This is what I have so far. I created a custom code with the run function that is going to be in the custom jar:

object SomeObject {

def run(ss: org.apache.spark.sql.SparkSession, metricName: String, dataFrameName: String, params: Option[Map[String, String]]): Unit = {

          val server_name = "jdbc:sqlserver://{SERVER_ADDR}"
      val database_name = "database_name"
      val url = server_name + ";" + "databaseName=" + database_name + ";"

      val table_name = "table_name"
      val username = "username"
      val password = "password"

      df_name_here.write
          .format("com.microsoft.sqlserver.jdbc.spark")
          .mode("overwrite")
          .option("url", url)
          .option("dbtable", table_name)
          .option("user", username)
          .option("password", password)
          .save()
    }
}

Note: The reason I'm using custom code is so I can be able to use this format "com.microsoft.sqlserver.jdbc.spark".

Can you please help figure out how to pass the dataframe to the function so I can replace with it?

Unless, I can use standard JDBC output and specify "format("com.microsoft.sqlserver.jdbc.spark")".

Not sure if it's possible since it takes the value from the driver: https://github.com/YotpoLtd/metorikku/blob/master/src/main/scala/com/yotpo/metorikku/output/writers/jdbc/JDBCOutputWriter.scala#L42

Thank you