How to read a pipe separated file with Cobrix

AbsaOSS / cobrix

A COBOL parser and Mainframe/EBCDIC data source for Apache Spark

Apache License 2.0

138 stars 77 forks source link

How to read a pipe separated file with Cobrix #677

Closed pinakigit closed 4 months ago

pinakigit commented 6 months ago

I have a file in Mainframe which is pipe delimited and has a header with column names which are pipe separated too. The file length is Fixed length and after the last field all are filled with spaces.

I can ftp this file as ASCII as well as Binary from Mainframe. Is there a way to read this file in Cobrix as it doesn't have a copybook and there is no field length for the fields.

yruslan commented 6 months ago

Hi, could you send an example of such a file with a copybook? It seems odd that the file is both pipe-separated and fixed record length.

pinakigit commented 6 months ago

Sample below. Here every record is of length of 228

yruslan commented 6 months ago

The file format looks like a pipe-delimited CSV. You can use spark-csv to convert it into a DataFrame: https://spark.apache.org/docs/latest/sql-data-sources-csv.html

val df = spark.read
  .format("csv")
  .option("header", "true")
  .option("delimiter", "|")
  .option("inferSchema", "true")
  .load("/path/to/file/or/folder")

df.show()