AbsaOSS / cobrix

A COBOL parser and Mainframe/EBCDIC data source for Apache Spark
Apache License 2.0
138 stars 77 forks source link

Reading EBCDIC file with multiple structure #663

Open MJames1030 opened 8 months ago

MJames1030 commented 8 months ago

Hi,

I'm using the cobrix library on databricks to read EBCDIC file. I have now a copybook with multiples structure. When I read the EBCDIC file all the data of the file are in the structure of the first structure.

Here my option : spark.read.format("cobol").option("record_length", "100").option("pedantic", "true").option("ebcdic_code_page", "cp1047").option("drop_value_fillers", "false").option("drop_group_fillers", "false").

I have also tried with the option : is_record_sequence = true

Here an extract of the file.

STRUCTURE DE RECORD DU COM-SET 03 00010000 01 REC-TYPE1. 00020000 03 T1-RECTYPE PIC X(02). 00030000 03 T1-MODYEAR PIC X(04). 00040000 03 T1-CODEMOD PIC X(06). 00050000 03 T1-LANGUE PIC X(02). 00060000 03 T1-SEQNUMB PIC X(03). 00070000 03 FILLER PIC X(13). 00080000 03 T1-CODEMODDESC PIC X(40). 00090000 03 T1-FILLER PIC X(02). 00100000 03 T1-MARQUE PIC X(01). 00110000 03 T1-PROCCOD PIC X(01). 00120000 03 T1-STARTDAT PIC X(07). 00130000 03 T1-EXPIRDAT PIC X(07). 00140000 03 T1-MODDATE PIC X(07). 00150000 03 T1-MODTIME PIC X(04). 00160000 03 FILLER PIC X(01). 00161000 00162000 01 REC-TYPE2. 00163000 03 T2-RECTYPE PIC X(02). 00164000 03 T2-MODYEAR PIC X(04). 00165000 03 T2-CLASS PIC X(02). 00166000 03 T2-LANGUE PIC X(02). 00167000 03 T2-PRNUMB PIC X(03). 00168000 03 T2-SEQNUMB PIC X(03). 00169000 03 FILLER PIC X(14). 00170000 03 T2-PRNRDESC PIC X(40). 00180000 03 FILLER PIC X(02). 00190000 03 T2-MARQUE PIC X(01). 00200000 03 T2-PROCCOD PIC X(01). 00210000 03 T2-STARTDAT PIC X(07). 00220000 03 T2-EXPIRDAT PIC X(07). 00230000 03 T2-MODDATE PIC X(07). 00240000 03 T2-MODTIME PIC X(04). 00250000 03 FILLER PIC X(01). 00260000 00270000 01 REC-TYPE3. 00280000 03 T3-RECTYPE PIC X(02). 00290000 03 T3-MODYEAR PIC X(04). 00300000 03 T3-CLASS PIC X(02). 00310000 03 T3-LANGUE PIC X(02). 00320000 03 T3-PACKAGE PIC X(03). 00330000 03 T3-SEQNUMB PIC X(03). 00340000 03 FILLER PIC X(14). 00350000 03 T3-PACKDESC PIC X(40). 00360000 03 FILLER PIC X(02). 00370000 03 T3-BRAND PIC X(01). 00380000 03 T3-PROCCOD PIC X(01). 00390000 03 T3-STARTDAT PIC X(07). 00400000 03 T3-EXPIRDAT PIC X(07). 00410000 03 T3-MODDATE PIC X(07). 00420000 03 T3-MODTIME PIC X(04). 00430000 03 FILLER PIC X(01). 00440000

There are 15 structures like this

Do you know who I can solve this ?

Thank you in advance, Jamal

yruslan commented 8 months ago

If the first record looks good, but rest of records are not, the most likely is the cobybook is not aligned with the record size.

Please, try the latest version 2.6.11, since a bug was fixed that caused Cobrix to ignore .option("record_length", "100") in certain circumstances.

If that won't help, you can use .option("debug", "true) to debug and determine the correct record size.