Open D3v3sh5ingh opened 10 months ago
Hi @D3v3sh5ingh, what's your high level offset layout?
For example: 0 - 19 Headers (to be ignored) 20 - 23 BDW 24 - 27 RDW 28 - 99 Payload 100 - 193 RDW ... 32000 Payload 32093 Footer (to be ignored)
Hi @yruslan My high level layout looks like below: BDW { RDW 45 bytes , RDW 1000 bytes, RDW 1000 bytes , RDW 1000 bytes ....} BDW { RDW 1000 bytes .....} ...... BDW { RDW 1000 bytes...., RDW 45 bytes}
45 bytes of header and trailer are inside the BDW as shown above. We want to remove these 45 bytes of header and trailer present in the file.
file_start_offset
and file_end_offset
work on the level of file, e.g. cases like:
HEDAER {45 bytes} BDW { RDW 1000 bytes, RDW 1000 bytes, RDW 1000 bytes , RDW 1000 bytes ....}
Since your 45 headers are part of record payload you can't do it using these options. What you can do is you can add the header as a redefine segment in your copybook, and then you can filter it out after you get the dataframe.
The copybook will looks like this:
01 RECORD.
05 HEDAER.
10 CONTENT X(45).
05 PAYLOAD REDEFINES HEADER.
... your payload goes at level 10 here
Hi , This is a sample output for my file . 45 bytes that i want to skip are at the start and at the end only . Not in each record. If I don't use the file _start_offset and file_end_offset , i am able to get above dataframe as output but I am getting two extra records(Header and Trailer). But if I use these options with 45 bytes , i face an error ( length of BDW block is too big ) .
Options 'file_start_offset' and 'file_end_offset' only drop bytes from the beginning or at the end of files, not from the payload. This is the expected behavior.
There are no options that allow dropping bytes from inside records, so possible solutions are:
df.filter(col("COL1").isNotNull)
Hi @yruslan
Issue : 643
File_start_offset and File_end_offset options for VB files are not working and throwing the same error as posted in issue 643. I have a file with both RDW and BDW (Record Format VB) . The file is with header and footer also. I want to skip first few bytes of header and last few bytes of footer. For that using options file_start_offset and file_end_offset but getting the similar error as in issue 643.