AbsaOSS / cobrix

A COBOL parser and Mainframe/EBCDIC data source for Apache Spark
Apache License 2.0
138 stars 78 forks source link

segment filter not working in 2.6.2 version #560

Closed saikumare-a closed 1 year ago

saikumare-a commented 1 year ago

Background [Optional]

Hi, we are receiving the multisegment ascii file and we would like to filter the data for a particular segment based on a column.

as per documentation, tried using option("segment_filter").

even after using this filter, observing no filtration of data is happening. can you help on checking on this?

saikumare-a commented 1 year ago

Hi @yruslan,

just tested this in 2.6.1 and working fine, but not working in 2.6.2. can you check into this

yruslan commented 1 year ago

Hi, What's your full spark.read code snippet?

saikumare-a commented 1 year ago

Hi @yruslan,

below are the options used

final_options = {'copybook': ‘', 'generate_record_id': 'false', 'drop_value_fillers': 'false', 'drop_group_fillers': 'false', 'pedantic': 'true', 'encoding': 'ascii', 'variable_size_occurs': 'true', 'record_format': 'D', 'segment_field': 'BASE_RCRD_ID', 'segment_filter': 'ABC'}

df=spark.read.format("cobol").options(**final_options).load()

yruslan commented 1 year ago

Yeah, I can see why it is happening. You can workaround by filtering your data frame using .filter(col("BASE_RCRD_ID") === "ABC") for now.

saikumare-a commented 1 year ago

Hi @yruslan ,

is this issue fixed or any timeline by when this could be fixed?

saikumare-a commented 1 year ago

Hi @yruslan,

any luck with looking into this?. Thanks in advance!!

yruslan commented 1 year ago

Not yet. Please, use the workaround for now.

yruslan commented 1 year ago

This should be fixed in 2.6.3 released yesterday.