AbsaOSS / cobrix

A COBOL parser and Mainframe/EBCDIC data source for Apache Spark
Apache License 2.0
136 stars 77 forks source link

Unable to read properly Variable Length RDW file using Cobrix #534

Open Loganhex2021 opened 1 year ago

Loganhex2021 commented 1 year ago

By using cobrix library, unable to read one of the Mainframe file (CP037) from Azure Databricks. But for the same file can able to read via Record Editor and it showing data correctly.

Used version: 2.2.2 File Type: Variable Length File Actual Records in the File: 102 Cobrix Read Record: 4 (Junk Data)

I have tried all the available option and it is not able to read the file properly.

@yruslan - Do you have any idea why we are not able to read the file which is readable by Record Editor?

Copybook first field after table name starting with COMP-3 type. is it causing an issue ? 01 HEADER. 05 Name-Num PIC S9(7) COMP-3.

Could you please provide any suggestion ?

yruslan commented 1 year ago

Hi, @Loganhex2021 , you can use

.option("debug", "true")

to see HEX of raw values to find the source of the issue. Most likely the copybook does not exactly match the data. Maybe because the data has additional headers that are not part of the copybook.