AbsaOSS / cobrix

A COBOL parser and Mainframe/EBCDIC data source for Apache Spark
Apache License 2.0
138 stars 78 forks source link

EBCDIC to ASCII file conversion #700

Closed PDebasish closed 1 month ago

PDebasish commented 3 months ago

Background: I am trying to convert an EBCDIC Mainframe file to ASCII file in a notebook. Below are the setup used:

  1. Cobrix library "cobol-parser_2.13-2.7.4.jar" from the url(https://repo1.maven.org/maven2/za/co/absa/cobrix/cobol-parser_2.13/2.7.4/cobol-parser_2.13-2.7.4.jar)
  2. I am using the Spark settings : Runtime 1.2 (Spark 3.4 and Delta 2.4)

//*Below is the code used in a Notebook using Scala. import org.apache.spark.sql.SparkSession var file_path = "abfss://" Data File Path in onelake var copybook="abfss://" ** Copybook File path in onelake

var spark = SparkSession.builder.getOrCreate() val df_cobol= spark.read.format("za.co.absa.cobrix.spark.cobol.source").option("copybook", copybook).load(file_path ) df_cobol.printSchema(); df_cobol.show()

Question

I am getting errors while executing the above code. I have attached an error snapshot on the same.

Cobrix_error
yruslan commented 2 months ago

Hi, when you download spark-cobol from Maven you get a thin JAR which does not include dependencies. In order to use spark-cobol in Databricks, use une of the bundles ('fat' JARs) that match your environment: https://github.com/AbsaOSS/cobrix/releases/tag/v2.7.4

I think this is the one you can use: https://github.com/AbsaOSS/cobrix/releases/download/v2.7.4/spark-cobol_2.12-2.7.4-bundle.jar

PDebasish commented 1 month ago

Thanks @yruslan , I was able to convert the EBCDIC to ASCII using the JAR files.