Blank fields are not recognized.

dlbfriend commented 2 months ago

Background [Optional]

Hello when I parse data file with cobirx, the data file and copybook was not match, if there is a null or empty,the data file does not retain space,and the following column was move backward .there is a column name data length,I use this record fixed length option ,when comes to position 40 there is incorrect values.any suggestions to solve null value was not retain space issues?

Question

if there is no space or null character in hex, can I parse data file with cobrix?

### Tasks

### Tasks

yruslan commented 2 months ago

You can add .option("debug", "true") so that a debug field is added to each field of your data. This way you can see which values were wrongly converted, and can make the copybook match the data. null is usually used when the value cannot be parsed according to the specified field definition (w.g. wrong COMP-3 number, etc).

dlbfriend commented 2 months ago

I use option as below, val df = spark.read .format("cobol") .option("copybook", copybookPath)
.option("record_format", "F") .option("record_length_field", "L-LENGTH") .option("debug",true) .load(inputFilePath) L-LENGTH Is a length filed.

    "L_TR_XCHG_IND": "N",
    "L_TR_XCHG_IND_debug": "D5",
    "L_TR_DEP_DATE": 0,
    "L_TR_DEP_DATE_debug": "000000000F"
},

-- "L_TR_XCHG_DATA": { -- "L_TR_XCHG_CURRNCY": "CE1", -- "L_TR_XCHG_CURRNCY_debug": "C3C5F1", -- "L_TR_XCHG_AMT_DEC": "", -- "L_TR_XCHG_AMT_DEC_debug": "40", -- "L_TR_XCHG_AMT_debug": "020200610F0235959C", -- "L_TR_XCHG_RATE_debug": "F2F0F04040404040", -- "L_TR_XCHG_OVRD": "", -- "L_TR_XCHG_OVRD_debug": "40"

-- }, "L_TR_CAPTURE_DATA": { "L_TR_ORIGIN": "", "L_TR_ORIGIN_debug": "40020210", "L_TR_OPER": "| 1", "L_TR_OPER_debug": "624F001F0000F100", "L_TR_SYS_DATE_debug": "00002C001C", "L_TR_ENTRY_TIME_debug": "02020061", "L_TR_BATCH_debug": "0F0000", "L_TR_SEQ_debug": "000004", "L_TR_COMMENT": "INT01 S", "L_TR_COMMENT_debug": "59249C00000CC9D5E3F0F140E20000",

The actual value of L_TR_XCHG_DATA {} should has no data, after pasring the datafile ,the value was move backward. and there is not any delimiter between L_TR_DEP_DATE_debug and L_TR_XCHG_CURRNCY_debug. and each record has different position with null values. is there a method to auto identify null or empty values for datafile?

yruslan commented 2 months ago

It seems L-LENGTH does not fully reflect record length. Maybe some parts of the record are not counted towards the record size. This is why the first record seems to be parsed properly, but the second one is corrupted.

You can adjust the size by specifying arichmetic expressions in the record length field definition. For example:

# Adjust the record length by adding 4 bytes
.option("record_length_field", "L_LENGTH + 4")

or

# Adjust the record length by subtracting 4 bytes
.option("record_length_field", "L_LENGTH - 4")

Note. It is very important to use '_' in the field name there. Otherwise Cobix might confuse the dash with the minus character (for example, L - LENGTH - 4).

AbsaOSS / cobrix

Blank fields are not recognized. #707

Background [Optional]

Question