Azure / spark-cdm-connector

MIT License
76 stars 33 forks source link

[Issue]reading error "AnalysisException: Manifest doesn't exist: model.json" #139

Open clairezhuang opened 1 year ago

clairezhuang commented 1 year ago

Did you read the pinned issues and search the error message?

Yes, but I didn't find the answer.

Summary of issue

We have several tables to be ingested using the notebook, they will run in paralle with read operaion. And some tables of them will fail everytime and different tables failed at different runs.

Rerun will work, but it will fail again next time. There is no problem before, but some tables will fail from few days before, without modificaion. The issue is reading parallelly using the same manifestPath , not have writing parallelly operation.

cluster DBR verion: 9.1 LTS (includes Apache Spark 3.1.2, Scala 2.12)

The error message shows "AnalysisException: Manifest doesn't exist: model.json":

AnalysisException                         Traceback (most recent call last)

in ----> 1 df = (spark.read.format("com.microsoft.cdm")       2   .option("storage", storagePath)       3   .option("manifestPath", sourceFileSystem + "/model.json")       4   .option("entity", entity)       5   .option("appId", appId)   /databricks/spark/python/pyspark/sql/readwriter.py in load(self, path, format, schema, **options)     208             return self._df(self._jreader.load(self._spark._sc._jvm.PythonUtils.toSeq(path)))     209         else: --> 210             return self._df(self._jreader.load())     211      212     def json(self, path, schema=None, primitivesAsString=None, prefersDecimal=None,   /databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py in __call__(self, *args)    1302     1303         answer = self.gateway_client.send_command(command) -> 1304         return_value = get_return_value(    1305             answer, self.gateway_client, self.target_id, self.name)    1306   /databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw)     121                 # Hide where the exception came from that shows a non-Pythonic     122                 # JVM exception message. --> 123                 raise converted from None     124             else:     125                 raise ### Error stack trace _No response_ ### Platform name Azure Databricks ### Spark version 3.1.2 ### CDM jar version 1.19.2 ### What is the format of the data you are trying to read/write? .csv