Azure / spark-cdm-connector

MIT License
76 stars 33 forks source link

[Issue] Can't find model.json file #164

Open SheoLee opened 4 months ago

SheoLee commented 4 months ago

Did you read the pinned issues and search the error message?

No, but I will read and search it now before creating an issue.

Summary of issue

I am using Synapse spark pool 3.3 to run the Python code sample with Credential passthrough: val df = spark.read.format("com.microsoft.cdm") .option("storage", "mystorage.dfs.core.windows.net") .option("manifestPath", "my-folder-path/model.json") .option("entity", "Person") .load() It failed with the attached error stack. My question: Is "-" allowed in the folder naming? I guess it might be the root cause but need confirmation. image

Error stack trace

`4 df = spark.read.format("com.microsoft.cdm").option("storage", storage)\ ----> 5 .option("manifestPath", manifest_path).option("entity", entity_name).load()

File /opt/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py:184, in DataFrameReader.load(self, path, format, schema, **options) 182 return self._df(self._jreader.load(self._spark._sc._jvm.PythonUtils.toSeq(path))) 183 else: --> 184 return self._df(self._jreader.load())

File ~/cluster-env/clonedenv/lib/python3.10/site-packages/py4j/java_gateway.py:1321, in JavaMember.call(self, *args) 1315 command = proto.CALL_COMMAND_NAME +\ 1316 self.command_header +\ 1317 args_command +\ 1318 proto.END_COMMAND_PART 1320 answer = self.gateway_client.send_command(command) -> 1321 return_value = get_return_value( 1322 answer, self.gateway_client, self.target_id, self.name) 1324 for temp_arg in temp_args: 1325 temp_arg._detach()

File /opt/spark/python/lib/pyspark.zip/pyspark/sql/utils.py:196, in capture_sql_exception..deco(*a, **kw) 192 converted = convert_exception(e.java_exception) 193 if not isinstance(converted, UnknownException): 194 # Hide where the exception came from that shows a non-Pythonic 195 # JVM exception message. --> 196 raise converted from None 197 else: 198 raise

AnalysisException: Manifest doesn't exist: model.json`

Platform name

Azure Synapse

Spark version

3.3

CDM jar version

1.19.7

What is the format of the data you are trying to read/write?

.cdm