AbsaOSS / cobrix

A COBOL parser and Mainframe/EBCDIC data source for Apache Spark
Apache License 2.0
138 stars 77 forks source link

za.co.absa.cobrix.cobol.parser.ast.Primitive cannot be cast to za.co.absa.cobrix.cobol.parser.ast.Group #398

Open akhilgarg140290 opened 3 years ago

akhilgarg140290 commented 3 years ago

Background [Optional]

A clear explanation of the reason for raising the question. This gives us a better understanding of your use cases and how we might accommodate them.

Question

I am getting below Error while parsing copy book with binary mainframe datafile. I am attaching copybook for the same.

Py4JJavaError: An error occurred while calling o200.load. : java.lang.ClassCastException: za.co.absa.cobrix.cobol.parser.ast.Primitive cannot be cast to za.co.absa.cobrix.cobol.parser.ast.Group at za.co.absa.cobrix.cobol.parser.Copybook$$anonfun$3.apply(Copybook.scala:229) at za.co.absa.cobrix.cobol.parser.Copybook$$anonfun$3.apply(Copybook.scala:225) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at za.co.absa.cobrix.cobol.parser.Copybook.generateRecordLayoutPositions(Copybook.scala:225) at za.co.absa.cobrix.spark.cobol.schema.CobolSchema.sparkSchema$lzycompute(CobolSchema.scala:55) at za.co.absa.cobrix.spark.cobol.schema.CobolSchema.sparkSchema(CobolSchema.scala:54) at za.co.absa.cobrix.spark.cobol.schema.CobolSchema.getSparkSchema(CobolSchema.scala:94) at za.co.absa.cobrix.spark.cobol.reader.fixedlen.FixedLenNestedReader.getSparkSchema(FixedLenNestedReader.scala:53) at za.co.absa.cobrix.spark.cobol.source.CobolRelation.schema(CobolRelation.scala:79) at org.apache.spark.sql.execution.datasources.LogicalRelation$.apply(LogicalRelation.scala:77) at org.apache.spark.sql.SparkSession.baseRelationToDataFrame(SparkSession.scala:424) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:172) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:156) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Thread.java:748) IOGCRFCB.TXT

yruslan commented 3 years ago

Hi, currently Cobrix supports only copybooks that have a root group field. We will improve it sometime, but for now you can add a root record field to the copybook as the workaround. Use 'collapse_root' retention policy so that the root record is removed from the Spark dataframe.

Updated copybook looks like this (only 1 line is added at the top):

           01  RECORD.
     ******************************************************************00001019
      *T   *       C A R D   P R O C E S S I N G   S Y S T E M         *00002019
      * S  *                                                           *00003019
      *  Y *   V E R I F I - I S S U R E R  - R E Q U E S T - F I L E  *00004019
      *   S*                      C O P Y B O O K                      *00005019
      ******************************************************************00006019
           05  CDRN-ISSUER-REQUEST-FILE        PIC X(1773).             00010002
           05  REDEFINES CDRN-ISSUER-REQUEST-FILE.                      000100
             10  HEADER-REC.                                              000100
               15  :T:-HDR-REQ-CONSTANT          PIC X(24).
               15  :T:-HDR-REQ-FILE-VERSION      PIC X(06).
               15  :T:-HDR-REQ-REC-INV-COUNT     PIC X(10).
               15  FILLER                        PIC X(1733).
           05  REDEFINES CDRN-ISSUER-REQUEST-FILE.                      000100
             10  REQUEST-FILE.                                          00020002
               15  :T:-CDRN-ISSUER-ID           PIC 9(06).              00020002
               15  :T:-CDRN-RECORD-TYPE         PIC X(06).              00050002
               15  :T:-CDRN-TRAN-MERC-NAME      PIC X(101).             00060002
               15  :T:-CDRN-TRAN-MERC-CITY      PIC X(13).              00070002
               15  :T:-CDRN-ACQ-INST-ID-CODE    PIC X(11).              00080002
               15  :T:-CDRN-ACC-IDENT-CODE      PIC X(15).              00090002
               15  :T:-CDRN-ACC-TERM-ID         PIC X(08).              00100002
               15  :T:-CDRN-CARD-MERC-NAME      PIC X(25).              00110002
               15  :T:-CDRN-CARD-MERC-CITY      PIC X(13).              00120002
               15  :T:-CDRN-CARD-CTRY-CODE      PIC X(03).              00130002
               15  :T:-CDRN-MERC-TYPE-MCC       PIC 9(04).              00140002
               15  :T:-CDRN-POS-ENTRY-MODE      PIC 9(02).              00150002
               15  :T:-CDRN-POS-COND-MODE       PIC 9(02).              00160002
               15  :T:-CDRN-FIRST-NAME          PIC X(225).             00170002
               15  :T:-CDRN-LAST-NAME           PIC X(225).             00180002
               15  :T:-CDRN-BILL-STR-ADDR1      PIC X(225).             00200002
               15  :T:-CDRN-BILL-STR-ADDR2      PIC X(225).             00200002
               15  :T:-CDRN-BILL-CITY           PIC X(225).             00200002
               15  :T:-CDRN-BILL-STATE          PIC X(02).              00200002
               15  :T:-CDRN-BILL-ZIP            PIC X(10).              00200002
               15  :T:-CDRN-PAYMENT-TYPE        PIC X(02).              00200002
               15  :T:-CDRN-ACCT-NUMBER         PIC X(16).              00210002
               15  :T:-CDRN-EXPR-DATE           PIC 9(04).              00220002
               15  :T:-CDRN-AUTH-CODE           PIC X(06).              00230009
               15  :T:-CDRN-CENTRAL-PROC-DATE   PIC X(11).
               15  :T:-CDRN-TRANS-AMOUNT        PIC S9(11)V9(2).
               15  :T:-CDRN-CURRENCY-CODE       PIC X(3).
               15  :T:-CDRN-ARN                 PIC 9(23).
               15  :T:-CDRN-MERCH-ORD-ID        PIC X(50).
               15  :T:-CDRN-TRANS-TYPE          PIC 9(2).
               15  :T:-CDRN-DISPUTE-DATE        PIC X(11).
               15  :T:-CDRN-DISPUTE-AMOUNT      PIC S9(11)V9(2).
               15  :T:-CDRN-REASON-CODE         PIC X(04).
               15  :T:-CDRN-CASE-NUMBER         PIC X(24).
               15  :T:-CDRN-REQUEST-NOTES       PIC X(225).
               15  :T:-CDRN-REC-GEN-TIMESTP     PIC X(20).
           05  REDEFINES CDRN-ISSUER-REQUEST-FILE.                      000100
             10  TRAILER-REC.                                             000100
               15  :T:-TLR-CONSTANT              PIC X(24).
               15  :T:-TLR-FILE-VERSION          PIC X(06).
               15  :T:-TLR-REC-INV-COUNT         PIC X(10).
               15  FILLER                        PIC X(1733).

The collapse root strategy can be specified like this:

    spark
      .read
      .format("cobol")
      .option("copybook_contents", copybook)
      .option("schema_retention_policy", "collapse_root")
      .load(dataPath)