AbsaOSS / cobrix

A COBOL parser and Mainframe/EBCDIC data source for Apache Spark
Apache License 2.0
138 stars 78 forks source link

Stream processing with Flink #649

Open Giackgamba opened 11 months ago

Giackgamba commented 11 months ago

Background

Hi! I'm not an expert on COBOL/EBCDIC data structures, but I'm implementing a CDC scenario using Flink (in java), and I'd have some binary field to decode, given a playbook.

In the README you say that "The COBOL copybooks parser doesn't have a Spark dependency and can be reused for integrating into other data processing engines".

Question

Is it really the case? What is roughly the process to decode a single message? Are there any examples not involving the spark "wrapper"?

Thank you in advance

yruslan commented 10 months ago

Hi, sorry for the delayed reply. Yes, Spark is not required, and you can use cobol-parser dependency that does not require Spark (still requires Scala dependency as a library).

Here is an example of Cobrix used without Spark to convert some mainframe data to JSON expressed as a unit test: https://github.com/AbsaOSS/cobrix/blob/master/cobol-converters/src/test/scala/za/co/absa/cobrix/cobol/converters/extra/SerializersSpec.scala

One important detail. When Cobrix is used with Spark, it converts binary files to Spark dataframes and uses Spark type model. But when Spark is not used, you can use a custom RecordHandler. An example of such a handler is in the above test suite. It uses Array[Any] (in Java it would be Object[] probably.)

Let me know if you have any more questions on this.