AbsaOSS / cobrix

A COBOL parser and Mainframe/EBCDIC data source for Apache Spark
Apache License 2.0
136 stars 77 forks source link

`seg_id0` is duplicated for the root segment for big files when multiple files are loaded #710

Closed yruslan closed 3 days ago

yruslan commented 4 days ago

Describe the bug

seg_id0 should never be duplicated for the root segment.

However, when loading big files, we see duplications.

Code snippet that caused the issue

Maybe this is happening only when record length field is used:

  .option("record_format", "F")
  .option("record_length_field", "REC_LENGTH + 17")
  .option("segment_field", "SEGMENT-ID")

Expected behavior

seg_id0 should never be duplicated for the root segment.

Context

Copybook (if possible)

--

Attach a small data file that can help reproduce the issue, if possible.