Azure / spark-cdm-connector

MIT License
76 stars 33 forks source link

Using spark cdm connector without a manifest file #114

Closed RithwikChhugani closed 1 year ago

RithwikChhugani commented 1 year ago

Hi there. Extending my comments on my previous issue https://github.com/Azure/spark-cdm-connector/issues/113#issuecomment-1331694561 I don't have a manifest file as I am working with model.json, hence I fail to understand why would I get an error related to the manifestPath argument.

kecheung commented 1 year ago

@RithwikChhugani Please check the docs again. Your usage is wrong... You should have a CDM data source and it should come with a manifest file.

  1. There is no such thing as a "cdmModel" option. You can search the code.
  2. You are missing the required arguments as mentioned.

Your code:

val df = spark.read.format("com.microsoft.cdm")
.option("storage",".dfs.core.windows.net")
.option("cdmModel", "https://....../model.json")
.option("entity", "account")
.load()
nielsvdc commented 1 year ago

You can just add the path to the model.json file in the manifestPath option. That's how we're using it.

val df = spark.read.format("com.microsoft.cdm") .option("storage", storage_url) .option("manifestPath", "path_to_model.json") .option("entity", entity_name) .load()

RithwikChhugani commented 1 year ago

Thanks @kecheung and @nielsvdc. The above suggestions resolved the error. Appreciate your support.