Closed saikumare-a closed 1 year ago
Hi, it seems the documentation is outdated. Will update it soon. For the record_format="D" most of the time you can use
df.withColumn("file_name", input_file_name())
instead of option('with_input_file_name_col': 'file_name').
This is a quite recent change.
tested on 2.6.2 and getting blank value instead of actual name
Try 2.6.5.
Works for me
spark
.read
.format("cobol")
.option("copybook_contents", copybook)
.option("pedantic", "true")
.option("record_format", "D")
.option("schema_retention_policy", "collapse_root")
.option("ascii_charset", "ISO-8859-1")
.option("generate_record_id", false)
.option("variable_size_occurs", true)
.option("drop_value_fillers", false)
.option("drop_group_fillers", false)
.option("debug", "string")
.option("filler_naming_policy", "previous_field_name")
.load(tmpFileName)
.withColumn("f", input_file_name()).show
+-----+-------+--------------------+
| A|A_debug| f|
+-----+-------+--------------------+
|12.34| 1234|file:/var/folders...|
| null| |file:/var/folders...|
+-----+-------+--------------------+
we are using 2.6.2 across the platform. testing or upgrading to 2.6.5 is a big effort for us.
is it possible for you to check on 2.6.2?
Yes, 2.6.2 produced blanks for me as well.
I think upgrading to 2.6.5 is the only option in this case since the issue has been fixed there.
Sure, thanks for reviewing on 2.6.2. and confirming. we can close the question
Thanks for your support as always.
Hi Team,
getting below error while trying to get filename. below is the code used. please correct if anything missed.
final_options:{ 'copybook': '\<copybook>', 'generate_record_id': 'false', 'drop_value_fillers': 'false', 'drop_group_fillers': 'false', 'pedantic': 'true', 'debug': 'string', 'filler_naming_policy': 'previous_field_name', 'with_input_file_name_col': 'file_name', 'encoding': 'ascii', 'record_format': 'D', 'ascii_charset': 'ISO-8859-1', 'variable_size_occurs': 'true' }
Code: print(f"final_options:{final_options}") import pyspark.sql.functions as F df = spark.read.format("cobol").options(**final_options).load('\<file>') df=df.withColumn("input_file_name", F.input_file_name()) df.display()
Error IllegalArgumentException: Option 'with_input_file_name_col' is supported only when one of this holds: 'record_format' = V or 'record_format' = VB or 'record_format' = D or 'is_record_sequence' = true or one of these options is set: 'record_length_field', 'file_start_offset', 'file_end_offset' or a custom record extractor is specified
Thanks, Saikumar