Open Il-Pela opened 9 months ago
Hi, thanks for the interest in the library.
Yes, it is possible to use Cobrix in this case, but it can be quite involved. You can't use spark-cobol
Spark data source to decode the data, but have to do it manually like this:
val copybookForField1 = CopybookParser.parseSimple(copyBookContents)
val row = RecordExtractors.extractRecord(copybookForField1.ast, field1Bytes, 0, handler = handler)
val record = handler.create(row.toArray, copybook.ast)
The resulting record will be Array[Any]
and for each subfield you can cast to the corresponding Java data type.
extractRecord()
and handler.create()
to each value. The resulting output can be a JSON string. See how Jackson could be used to convert each record to a JSON: https://github.com/AbsaOSS/cobrix/blob/68f7362ed55db66a51293de207c4ca0d83af0c83/cobol-converters/src/test/scala/za/co/absa/cobrix/cobol/converters/extra/SerializersSpec.scala#L161Let me know if you decide to do it and have any issues.
Background
Let's say that I'm reading a "normal" AVRO file using Spark. One of the fields in the schema of this Avro is a Binary encoded as EBCDIC that should be decoded using a copycobol referenced by another field within the same schema. Potentially each record can have its copycobol (so for each record the binary might have a different schema) and the desiderata is to produce a json version of the binary field to store somewhere else.
And in the folder copycobol/ I have:
Question
Is it possible to leverage the library to decode a field instead of a file? Or do I have to save the binary field temporarily in a file and decode it from there?
Thank you for any suggestion! :)