Closed sree018 closed 1 year ago
Spark does not have varchar()
type, nor integer(6)
data types, only string
and integer
, so the expected output you specified is not possible.
However, it could be possible to retain metadata after schema flattening. How do you flat the schema?
SparkUtils.flattenSchema(df,useShortFieldManes=false)
I've tested if retaining the metadata is possible, and it is.
This PR makes SparkUtils.flattenSchema() retain metadata: https://github.com/AbsaOSS/cobrix/pull/635
It is already merged into master
. Please, test if you can and let me know if it works for you.
@yruslan
New feature working.
thanks for feature
Awesome! Thanks for letting me know
Background
Currently, copybook metadata comes as spark schema, we need schema as rdbms level
Example [Optional]
''' 01 MASTER-RECORD. 02 RDT-TLF-MTHD-NM PIC X(08).
02 RDT-ADJ-ORGN-TRAN-DT PIC 9(06).
02 FILLER PIC X(03). 02 RDT-ADDL-DATA-GROUP. 05 RDT-ADDL-DATA OCCURS 0 TO 2 TIMES DEPENDING ON RDT-ADDL-SEGS-NO.
10 RDT-ADDL-SEG-KEY.
15 RDT-ADDL-SEG-KEY-PROD PIC X(02).
15 RDT-ADDL-SEG-KEY-TYPE PIC S9(15)V99 COMP-3.
''' Current Schema: root |-- RDT-TLF-MTHD-NM String |-- RDT-ADJ-ORGN-TRAN-DT integer
|-- RDT-ADDL-DATA-GROUP |-- RDT-ADDL-SEG-KEY |-- RDT-ADDL-SEG-KEY-PROD String |-- RDT-ADDL-SEG-KEY-TYPE DECIMAL (15,2)
expected out |-- RDT-TLF-MTHD-NM VARCHAR(08) |-- RDT-ADJ-ORGN-TRAN-DT integer (06)
|-- RDT-ADDL-DATA-GROUP |-- RDT-ADDL-SEG-KEY |-- RDT-ADDL-SEG-KEY-PROD VARCHAR(08) |-- RDT-ADDL-SEG-KEY-TYPE DECIMAL (15,2)
we are able get parent-level element lengths only before flattening
df.schema.fields(0).metadata.getLong("maxLength")
is there any option to get the expected schema?