Closed pritdb closed 2 years ago
Hi, Thanks for the bug report. At the first glance, it looks like a value is encountered that cannot fit into a decimal type. It shouldn't be related to the size of the file.
What is the copybook for this file?
Thanks @yruslan . I am attaching the copybook Sample.txt .
Thanks, will take a look and try to reproduce the issue
Hi, I was trying to find the condition that could cause the overflow, but couldn't find it so far. So asking you to answer more questions:
spark.read...
) and the action you are performing (df.write....
).The reason for these questions is that the error usually means there is a data type incompatibility between schemas. Unfortunately, Spark error does not tell us which column is affected.
In addition, are using SaveMode.Overwrite
or SaveMode.Append
when writing to the output folder?
Hi @yruslan ,
Here are the versions: Apache Spark 3.1.2 Scala: 2.12.10
And the read & write code:
val df = spark
.read
.format("cobol")
.option("copybook", "<path-to-copybook>")
.option("encoding", "ascii")
.option("is_text", "true")
.option("schema_retention_policy", "collapse_root")
.option("drop_value_fillers", "false")
.load(inputFile)
// Causes the error
df.count()
// Causes the same error
df
.write
.partitionBy("field-1")
.format("delta")
.mode("overwrite")
.option("replaceWhere", s"field-1 = '$field1_value' ")
.save(outputPath)
Thanks for the info. It's very helpful. Will try testing more using Spark 3.1.2/Scala2.12
Hi, I still wasn't able to reproduce the issue. Tried various ways an input data can overflow numeric data types. A new version is released (2.4.4). Parsing ASCII numbers made more strict. Please, check if the issue persists.
If the issue persists, I'll try testing it on a big ASCII file, 1GB.
Hi, there is a progress on fixing the issue. There is a workaround:
.option("enable_indexes", "false")
and we are working on the fix
Great. Thanks for the update @yruslan . Hope the fix will be part of the next release.
We were unable to reproduce exactly this issue but found another issue that happens on big ASCII files. If disabling indexes helps reading your file, there is a good chance that the fix will help as well.
A new version (2.4.5
) is released. Please, let me know if it fixes the issue.
Thanks a lot @yruslan for all the updates. I don't currently have access to the environment where this occurred, but have requested the folks who have to test this out. Will keep you posted when I hear from them.
Describe the bug
Running into an issue when trying to read a variable length newline separated ASCII file using Cobrix. Please see the Stacktrace below:
To Reproduce
Steps to reproduce the behaviour OR commands run:
Expected behaviour
The file should parse correctly.
Screenshots
Please see the Stacktrace provided above.