Open KipTwitchell opened 3 years ago
For next week we want to take a look at this item. https://www.ibm.com/support/pages/jzos-apis
This short Scala example shows reading Virginia Agency file on z/OS (EBCDIC data) using JZOS under Spark.
import org.apache.spark.sql.SparkSession import com.ibm.jzos._ val filePaths = Map("inputPath" -> "'GEBT.SPK.", "outputPath" -> "'GEBT.SPK.") var agencyFileX: RecordReader = null agencyFileX = RecordReader.newReader(ZFile.getSlashSlashQuotedDSN(filePaths("inputPath") + "VAREF.AGENCY.FIXLDATA'"), ZFileConstants.FLAG_DISP_SHR) var lenAgencyRead: Int = 0 val agencyLine = new Array[Byte](agencyFileX.getLrecl) lenAgencyRead = agencyFileX.read(agencyLine) val agencyLineStrValue = new String(agencyLine, 0, lenAgencyRead, ZUtil.getDefaultPlatformEncoding)
The command issued in Spark:
:LOAD agency.scala
Alternatively the following command can be issued on entry to Spark:
spark-shell -i agency.scala
On IBM Kanplex SP12 the following prerequisite profile commands were issued:
NBEESLE:/Spark/bin>. /ngsafr/safrbld.profile /Spark/bin> . /ngsafr/.profile /Spark/bin> . load-spark-env.sh
We converted the Scala example we used from EBCDIC codepage to ASCII using the iconv command in USS. The command to change the codepage tag of a file is chgtag. For example:
iconv -f IBM-1047 -t ISO8859-1 agency_rdd.scala > agency_rddb.scala
chtag -t -c ISO8859-1 agency_rddb.scala
The command to display the codepage and taking of files is ls -T.
Today's work (3/12/21) included the following:
Suggestions for next week include:
Results from Mar 19:
Apr. 9th Progress.
Creating an RDD from a parallelized collection can be done with this command:
val dataRDD=spark.sparkContext.parallelize(Seq(("sun",01),("mon",02),("tue",03), ("wed",04),("thus",05)))
The "Seq" can be replaced by the collection that came back from jzos commend.
Then Spark Functions like the following could be used again the resulting RDD: dataRDD.count() // Number of items in this Dataset res0: Long = 126 // May be different from yours as README.md will change over time, similar to other outputs
dataRDD.first() // First item in this Dataset res1: String = # Apache Spark
Problem encountered using context:
val data=spark.sparkContext.parallelize(agencyLine) ----------^
Noticed related error message in spark start-up:
/Spark/bin> spark-shell
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
21/04/09 13:31:49 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using built in-java classes where applicable
21/04/09 13:31:50 ERROR SparkContext: Error initializing SparkContext.
java.io.FileNotFoundException: /etc/spark/events/local-1618000310607.inprogress (EDC5111I Permission denied.)
at java.io.FileOutputStream.open(FileOutputStream.java:286)
at java.io.FileOutputStream.
On Feb. 25 we reviewed the following web pages Specifically section 3. http://www.redbooks.ibm.com/redbooks/pdfs/sg247177.pdf
http://www.longpelaexpertise.com/ezine/HLASMfromJava.php
Our conclusion from the work was that Java access to packed data types is pretty complex; using MR95 to convert allows legacy applications to use packed data types, without requiring the complexity in the Java /Spark works, and obviating the need for conversion of the event and reference data.