AbsaOSS / cobrix

A COBOL parser and Mainframe/EBCDIC data source for Apache Spark
Apache License 2.0
138 stars 78 forks source link

The field is a leaf element and cannot contain nested fields. #187

Open mpisu opened 5 years ago

mpisu commented 5 years ago

01 W1RECORD. 05 W1TEST.
06 W1TEST1 PIC 9(002). 06 filed1 PIC 9(002). 06 filed2 PIC 9(003). 06 filed3 PIC 9(008). 05 filed4 PIC 9(002). 05 filed5 PIC X(023). 05 filed6 PIC 9(002). 05 filed7 PIC 9(003). 05 filed8 PIC S9(013) COMP-3 . 05 filed9 PIC 9(001). 05 filed10 PIC 9(002). 05 filed11 PIC 9(002). 05 filed12.
06 filed13 PIC 9(005) COMP-3 . 06 filed14 PIC S9(13) COMP-3 . 06 filed15 PIC 9(005) COMP-3 . 06 filed16 PIC S9(13) COMP-3 . 05 filed17.
06 filed18 PIC 9(005) COMP-3 . 06 filed19 PIC S9(13) COMP-3 . 06 filed110 PIC 9(005) COMP-3 . 06 filed111 PIC S9(13) COMP-3 . 05 filed112 PIC 9(005) COMP-3 . 05 filed113 PIC 9(005) COMP-3 . 05 filed114 PIC 9(5) COMP-3 . 05 filed1115 PIC 9(2). 05 filed1116 PIC 9(012) COMP-3 . 05 filed1117.
06 filed1118 PIC 9(003) COMP-3 . 06 filed1119 PIC 9(008). 05 filed1 PIC X(001). 05 filed1 PIC X(001). 05 filed1 PIC 9(1).

User class threw exception: za.co.absa.cobrix.cobol.parser.exceptions.SyntaxErrorException: Syntax error in the copybook at line 29, field filed1116: The field is a leaf element and cannot contain nested fields.

yruslan commented 5 years ago

Tried parsing this copybook in the current spark-cobol version 1.0.1. Got:

Exception in thread "main" org.apache.spark.sql.AnalysisException: Found duplicate column(s) in the data schema: `filed1`;

Which is indeed the case.

Could you please attach your copybook as a file to make sure padding issues do not interfere? Also, please, attach in a comment the piece of code you use to parse/read your file.

mpisu commented 5 years ago

RECORD.txt thanks !!!!!! :)

mpisu commented 5 years ago

sorry

yruslan commented 5 years ago

The file you've sent has syntax errors, but none of them is the one you've specified. Which version of spark-cobol are you using?

Syntax error in the copybook at line 3: Invalid input '(' at position 3:27

PICs cannot contain spaces. 9 (002) is therefore incorrect. The correct PIC would be 9(002).

Syntax error in the copybook at line 7: Invalid input '4' at position 7:23

Column names contain spaces. archiviato 4 is therefore incorrect. The correct column name would be archiviato4.

mpisu commented 5 years ago

Version spark-cobol 1.0.1 In the file I have no space PIC 9 (002)

mpisu commented 5 years ago

File.txt

the file is correct no space

yruslan commented 5 years ago

Which version of spark-cobol are you using?

mpisu commented 5 years ago

Version spark-cobol 1.0.1

yruslan commented 5 years ago

I get a different error when trying to parse the latest copybook you have attached:

Found duplicate column(s) in the data schema: `filed1`

This is expected since you have several fields named 'field1'. After removing duplicates the copybook is parsed properly.

Could you please ensure that the copybook you are trying to parse and the one you have attached are exactly the same.