Closed aditi-kumari-singh closed 1 year ago
It doesn't sound like this question is about spark-xml itself, right? You can configure your SAX parser by setting attributes on the parser factory object.
actually the error comes when used with spark-xml library.
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3) (10.0.1.4 executor driver): java.util.concurrent.ExecutionException: org.xml.sax.SAXParseException; systemId: file:/local_disk0/spark-0e00459b-fab1-47ed-bf54-658a2466adc3/userFiles-f8eafb2e-843f-481a-9cc5-d74a7934083c/auth.079.001.02_xxxxx_1.1.0.xsd; lineNumber: 5846; columnNumber: 99; Current configuration of the parser doesn't allow a maxOccurs attribute value to be set greater than the value 5,000.
Can you say more about how this arises? I ask because you mention XSDs. Also is there more to the stack trace in the logs? what is maxOccurs in your XSD?
Unable to parse nested xml using pyspark and XSDs, returns below error org.xml.sax.SAXParseException: Current configuration of the parser doesn't allow a maxOccurs attribute value to be set greater than the value 5,000.
In Java APIs the fix is to set : jdk.xml.maxOccurLimit=0 , where can we do this in databricks.