Stream processing with Flink

AbsaOSS / cobrix

A COBOL parser and Mainframe/EBCDIC data source for Apache Spark

Apache License 2.0

138 stars 78 forks source link

Background

Hi! I'm not an expert on COBOL/EBCDIC data structures, but I'm implementing a CDC scenario using Flink (in java), and I'd have some binary field to decode, given a playbook.

In the README you say that "The COBOL copybooks parser doesn't have a Spark dependency and can be reused for integrating into other data processing engines".

Hi, sorry for the delayed reply. Yes, Spark is not required, and you can use cobol-parser dependency that does not require Spark (still requires Scala dependency as a library).

Here is an example of Cobrix used without Spark to convert some mainframe data to JSON expressed as a unit test: https://github.com/AbsaOSS/cobrix/blob/master/cobol-converters/src/test/scala/za/co/absa/cobrix/cobol/converters/extra/SerializersSpec.scala

One important detail. When Cobrix is used with Spark, it converts binary files to Spark dataframes and uses Spark type model. But when Spark is not used, you can use a custom RecordHandler. An example of such a handler is in the above test suite. It uses Array[Any] (in Java it would be Object[] probably.)

Let me know if you have any more questions on this.

AbsaOSS / cobrix

Stream processing with Flink #649

Background

Question