AbsaOSS / cobrix

A COBOL parser and Mainframe/EBCDIC data source for Apache Spark
Apache License 2.0
136 stars 77 forks source link

ebcdic_code_page for German character ä,ß,ü #653

Closed MJames1030 closed 8 months ago

MJames1030 commented 8 months ago

Hi,

I'm using cobrix libraries on databricks to convert EBCDIC files. I have now a file with Geman alphabet, and I did not find any ebcdic_code_page to read the german alphabet.

Example: "{u~eren R} cksitzpl {those in Mikrovlies" is returned instead of "deräußeren Rücksitzplätze in Mikrovlies " ArtVerlours Eco ""

Thank you, Jamal

yruslan commented 8 months ago

Hi @MJames1030, thanks for the feature request! It is possible to add a custom EBCDIC code page if you know the EBCDIC -> ASCII/Unicode conversion for your characters. See the example here:

But if your code page is one of standard ones, and you know which code page is used at the source, we add support for this code page directly in Cobrix.

MJames1030 commented 8 months ago

Hi @yruslan ,

Thank you for your feedback. We are able to convert the EBCDIC file by using the code page 273. If you could add it directly to cobrix it will be great.

Thank you in advance, Jamal

yruslan commented 8 months ago

The support is added in this branch: https://github.com/AbsaOSS/cobrix/tree/feature/653-add-ebcdic-codepage-273

If you could test it before we release the new version of Cobrix, that could help to ensure it works for you as expected.

You can build a bundle Cobrix jar using sbt assembly, and use the snapshot JAR in your Spark environment.

sbt -DSPARK_VERSION="3.4.0" ++2.12.17 assembly

The code page can be selected by passing the option to the Spark reader:

spark.read.format("cobol")
  .option("ebcdic_code_page", "cp273")
  ...
yruslan commented 8 months ago

Hi @MJames1030, use 'spark-cobol-...-SNAPSHOT-bundle.jar', not 'cobol-parser-*'. The cobol parser is for use cases that do not sure Spark.

MJames1030 commented 8 months ago

Hi @yruslan ,

I confirm that's working. image

Thank you for the work,

yruslan commented 8 months ago

Awesome, this will be released soon.

MJames1030 commented 8 months ago

Do you have a date in mind ? :)

yruslan commented 8 months ago

Tomorrow, or in worst case Thursday 😆

yruslan commented 8 months ago

Fixed in 2.6.10