AbsaOSS / cobrix

A COBOL parser and Mainframe/EBCDIC data source for Apache Spark
Apache License 2.0
136 stars 78 forks source link

Can't read multiple main headers defined in single copybook #670

Open Kavya1552 opened 4 months ago

Kavya1552 commented 4 months ago

Background [Optional]

I'm not able to read multiple main headers defined in single copybook .Cause Cobrix can handle only one Main header copybook no multiple.

In my project i have a requirement to read multiple copybooks as One main Copybook using Cobrix. Example: TRANSACTION.CPY 01 CUSTOMERS 05 PURCHASES FIRST_NAME PIC X (04) LASTNAME_NAME PIC X (04) 01 ORGANIZATION 05 DEPARTMENTS ORG_NAME PIC X (04) VENDOR_NAME PIC X (09) 01 MEDICARE 05 BILLS TREATMENT_TYPE PIC X (04) LOCATION_NAME PIC X (09) 01 MEMBERSHIP 05 PARTNERS PARTNER_TYPE PIC X (04) PARTNER_NAME PIC X (09)

Question

Does cobrix have a limitation in reading multiple copybook that defined in single copybook? Can we achieve it using cobrix in this case?

Kavya1552 commented 4 months ago

@yruslan can you please provide your thoughts on this. Thanks!

yruslan commented 4 months ago

Yes, it is possible to use Cobrix specifying more than one copybook using:

.option("copybooks", "/path/to/copybook1,/path/to/copybook2,/path/to/copybook3")

This will create a layout where root GROUPs redefine each other for each copybook.

For example, if copybooks are:

    val copyBookContents1: String =
      """        01  RECORD-COPYBOOK-1.
        |           05  GROUP-1.
        |              06  FIELD-1            PIC X(10).
        |              06  FILLER             PIC X(5).
        |              06  GROUP-2.
        |                 10  NESTED-FIELD-1  PIC 9(10).
        |                 10  FILLER          PIC 9(5).
        |""".stripMargin
    val copyBookContents2: String =
      """        01  RECORD-COPYBOOK-2A.
        |           05  GROUP-1.
        |              06  FIELD-1            PIC X(20).
        |              06  FILLER             PIC X(10).
        |              06  GROUP-2.
        |                 10  NESTED-FIELD-1  PIC 9(20).
        |                 10  FILLER          PIC 9(10).
        |        01  RECORD-COPYBOOK-2B REDEFINES RECORD-COPYBOOK-2A.
        |           05  GROUP-1.
        |              06  FIELD-1            PIC X(20).
        |              06  FILLER             PIC X(10).
        |              06  GROUP-2.
        |                 10  NESTED-FIELD-1  PIC 9(20).
        |                 10  FILLER          PIC 9(10).
        |""".stripMargin
    val copyBookContents3: String =
      """        01  RECORD-COPYBOOK-3.
        |           05  GROUP-1.
        |              06  FIELD-1            PIC X(30).
        |              06  FILLER             PIC X(15).
        |              06  GROUP-2.
        |                 10  NESTED-FIELD-1  PIC 9(30).
        |                 10  FILLER          PIC 9(15).
        |""".stripMargin

The merged copybook layout will be:

-------- FIELD LEVEL/NAME --------- --ATTRIBS--    FLD  START     END  LENGTH

1 RECORD_COPYBOOK_1                    r              1      1     90     90
  5 GROUP_1                                           2      1     30     30
    6 FIELD_1                                         3      1     10     10
    6 FILLER                                          4     11     15      5
    6 GROUP_2                                         5     16     30     15
      10 NESTED_FIELD_1                               6     16     25     10
      10 FILLER                                       7     26     30      5
1 RECORD_COPYBOOK_2A                   rR             8      1     90     90
  5 GROUP_1                                           9      1     60     60
    6 FIELD_1                                        10      1     20     20
    6 FILLER                                         11     21     30     10
    6 GROUP_2                                        12     31     60     30
      10 NESTED_FIELD_1                              13     31     50     20
      10 FILLER                                      14     51     60     10
1 RECORD_COPYBOOK_2B                   rR            15      1     90     90
  5 GROUP_1                                          16      1     60     60
    6 FIELD_1                                        17      1     20     20
    6 FILLER                                         18     21     30     10
    6 GROUP_2                                        19     31     60     30
      10 NESTED_FIELD_1                              20     31     50     20
      10 FILLER                                      21     51     60     10
1 RECORD_COPYBOOK_3                    R             22      1     90     90
  5 GROUP_1                                          23      1     90     90
    6 FIELD_1                                        24      1     30     30
    6 FILLER                                         25     31     45     15
    6 GROUP_2                                        26     46     90     45
      10 NESTED_FIELD_1                              27     46     75     30
      10 FILLER                                      28     76     90     15

I usually prefer to combine copybooks manually so I have more control on which fields are part of every segment, and which are different for each segment.

Kavya1552 commented 4 months ago

@yruslan Thanks for your reply. In my case there are multiple record types defined in single copybook. So, do you think this ".option("copybooks", "/path/to/copybook1,/path/to/copybook2,/path/to/copybook3")" works?

yruslan commented 4 months ago

The "copybooks" options is specifically designed for the situation of multiple record types. And it works for our use cases.