Open akhilgarg140290 opened 3 years ago
Hi, currently Cobrix supports only copybooks that have a root group field. We will improve it sometime, but for now you can add a root record field to the copybook as the workaround. Use 'collapse_root' retention policy so that the root record is removed from the Spark dataframe.
Updated copybook looks like this (only 1 line is added at the top):
01 RECORD.
******************************************************************00001019
*T * C A R D P R O C E S S I N G S Y S T E M *00002019
* S * *00003019
* Y * V E R I F I - I S S U R E R - R E Q U E S T - F I L E *00004019
* S* C O P Y B O O K *00005019
******************************************************************00006019
05 CDRN-ISSUER-REQUEST-FILE PIC X(1773). 00010002
05 REDEFINES CDRN-ISSUER-REQUEST-FILE. 000100
10 HEADER-REC. 000100
15 :T:-HDR-REQ-CONSTANT PIC X(24).
15 :T:-HDR-REQ-FILE-VERSION PIC X(06).
15 :T:-HDR-REQ-REC-INV-COUNT PIC X(10).
15 FILLER PIC X(1733).
05 REDEFINES CDRN-ISSUER-REQUEST-FILE. 000100
10 REQUEST-FILE. 00020002
15 :T:-CDRN-ISSUER-ID PIC 9(06). 00020002
15 :T:-CDRN-RECORD-TYPE PIC X(06). 00050002
15 :T:-CDRN-TRAN-MERC-NAME PIC X(101). 00060002
15 :T:-CDRN-TRAN-MERC-CITY PIC X(13). 00070002
15 :T:-CDRN-ACQ-INST-ID-CODE PIC X(11). 00080002
15 :T:-CDRN-ACC-IDENT-CODE PIC X(15). 00090002
15 :T:-CDRN-ACC-TERM-ID PIC X(08). 00100002
15 :T:-CDRN-CARD-MERC-NAME PIC X(25). 00110002
15 :T:-CDRN-CARD-MERC-CITY PIC X(13). 00120002
15 :T:-CDRN-CARD-CTRY-CODE PIC X(03). 00130002
15 :T:-CDRN-MERC-TYPE-MCC PIC 9(04). 00140002
15 :T:-CDRN-POS-ENTRY-MODE PIC 9(02). 00150002
15 :T:-CDRN-POS-COND-MODE PIC 9(02). 00160002
15 :T:-CDRN-FIRST-NAME PIC X(225). 00170002
15 :T:-CDRN-LAST-NAME PIC X(225). 00180002
15 :T:-CDRN-BILL-STR-ADDR1 PIC X(225). 00200002
15 :T:-CDRN-BILL-STR-ADDR2 PIC X(225). 00200002
15 :T:-CDRN-BILL-CITY PIC X(225). 00200002
15 :T:-CDRN-BILL-STATE PIC X(02). 00200002
15 :T:-CDRN-BILL-ZIP PIC X(10). 00200002
15 :T:-CDRN-PAYMENT-TYPE PIC X(02). 00200002
15 :T:-CDRN-ACCT-NUMBER PIC X(16). 00210002
15 :T:-CDRN-EXPR-DATE PIC 9(04). 00220002
15 :T:-CDRN-AUTH-CODE PIC X(06). 00230009
15 :T:-CDRN-CENTRAL-PROC-DATE PIC X(11).
15 :T:-CDRN-TRANS-AMOUNT PIC S9(11)V9(2).
15 :T:-CDRN-CURRENCY-CODE PIC X(3).
15 :T:-CDRN-ARN PIC 9(23).
15 :T:-CDRN-MERCH-ORD-ID PIC X(50).
15 :T:-CDRN-TRANS-TYPE PIC 9(2).
15 :T:-CDRN-DISPUTE-DATE PIC X(11).
15 :T:-CDRN-DISPUTE-AMOUNT PIC S9(11)V9(2).
15 :T:-CDRN-REASON-CODE PIC X(04).
15 :T:-CDRN-CASE-NUMBER PIC X(24).
15 :T:-CDRN-REQUEST-NOTES PIC X(225).
15 :T:-CDRN-REC-GEN-TIMESTP PIC X(20).
05 REDEFINES CDRN-ISSUER-REQUEST-FILE. 000100
10 TRAILER-REC. 000100
15 :T:-TLR-CONSTANT PIC X(24).
15 :T:-TLR-FILE-VERSION PIC X(06).
15 :T:-TLR-REC-INV-COUNT PIC X(10).
15 FILLER PIC X(1733).
The collapse root strategy can be specified like this:
spark
.read
.format("cobol")
.option("copybook_contents", copybook)
.option("schema_retention_policy", "collapse_root")
.load(dataPath)
Background [Optional]
A clear explanation of the reason for raising the question. This gives us a better understanding of your use cases and how we might accommodate them.
Question
I am getting below Error while parsing copy book with binary mainframe datafile. I am attaching copybook for the same.
Py4JJavaError: An error occurred while calling o200.load. : java.lang.ClassCastException: za.co.absa.cobrix.cobol.parser.ast.Primitive cannot be cast to za.co.absa.cobrix.cobol.parser.ast.Group at za.co.absa.cobrix.cobol.parser.Copybook$$anonfun$3.apply(Copybook.scala:229) at za.co.absa.cobrix.cobol.parser.Copybook$$anonfun$3.apply(Copybook.scala:225) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at za.co.absa.cobrix.cobol.parser.Copybook.generateRecordLayoutPositions(Copybook.scala:225) at za.co.absa.cobrix.spark.cobol.schema.CobolSchema.sparkSchema$lzycompute(CobolSchema.scala:55) at za.co.absa.cobrix.spark.cobol.schema.CobolSchema.sparkSchema(CobolSchema.scala:54) at za.co.absa.cobrix.spark.cobol.schema.CobolSchema.getSparkSchema(CobolSchema.scala:94) at za.co.absa.cobrix.spark.cobol.reader.fixedlen.FixedLenNestedReader.getSparkSchema(FixedLenNestedReader.scala:53) at za.co.absa.cobrix.spark.cobol.source.CobolRelation.schema(CobolRelation.scala:79) at org.apache.spark.sql.execution.datasources.LogicalRelation$.apply(LogicalRelation.scala:77) at org.apache.spark.sql.SparkSession.baseRelationToDataFrame(SparkSession.scala:424) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:172) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:156) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Thread.java:748) IOGCRFCB.TXT