Open kanika167 opened 4 years ago
By default, Cobrix retains the root GROUP by putting all columns under the corresponding struct field. You can use a different schema retention polity to get your columns on the root level: .option("schema_retention_policy", "collapse_root")
Also, by looking at the sample output it seems that data hasn't been decoded properly. Use .option("debug", "true")
to investigate what is being decoded.
I am trying to run the following command on spark shell
val df = spark.read.format("za.co.cobrix.spark.cobol.source").option("copybook","test.cob").load("/user/data")
I have passed the required jars -> spark-cobol,cobol-parser,scodec-bits/core and antlr4-runtime-4.8-1 (without this I was getting NoClassDefFoundError for org/antlr/v4/runtime/CharStreams)
but now I am getting below exception -
java.io.InvalidClassException: org.antlr.v4.runtime.atn.ATN; Could not deserialize ATN with UUID 59627784-3be5-417a-b9eb-8131a7286089 (expected aadb8d7e-aeef-4415-ad2b-8204d6cf042e or a legacy UUID).
For security reasons I can't share with you the actual copybook and datafiles Spark Version - 2.2.0 Cloudera4 Also, where cam I find the documentation for this API.
I am trying to run the following command on spark shell
val df = spark.read.format("za.co.cobrix.spark.cobol.source").option("copybook","test.cob").load("/user/data")
I have passed the required jars -> spark-cobol,cobol-parser,scodec-bits/core and antlr4-runtime-4.8-1 (without this I was getting NoClassDefFoundError for org/antlr/v4/runtime/CharStreams)
but now I am getting below exception -
java.io.InvalidClassException: org.antlr.v4.runtime.atn.ATN; Could not deserialize ATN with UUID 59627784-3be5-417a-b9eb-8131a7286089 (expected aadb8d7e-aeef-4415-ad2b-8204d6cf042e or a legacy UUID).
For security reasons I can't share with you the actual copybook and datafiles Spark Version - 2.2.0 Cloudera4 Also, where cam I find the documentation for this API.
NEVER MIND FOR THIS ISSUE. USED AN UBER JAR
But I am using option("schema_retention_option","collapse_root"), It didn't show me any difference in the schema structure
The option("schema_retention_option","collapse_root")
should make a difference. Try to compare outputs of df.printSchema
.
Background [Optional]
I have a copybook file like -
01 XXXXXX 04 AAAAA PIC X(10). 04 BAAAA PIC X(4). 04 CAAAA PIC X(4). 04 DAAAA PIC XX.
There is a data file (in .txt) with specified field length data. When I am trying to read it into a data frame I am just getting 1 column name XXXXXX and rows as list of actual columns. But even there the data is either null / blank
XXXXXX
[,,,null,] [,,,null,] [,,,null,] [,,,null,] [,,,null,]
Question
What am I doing wrong above?