Spark POC Progress - Githubissues

KipTwitchell commented 3 years ago

On Feb. 25 we reviewed the following web pages Specifically section 3. http://www.redbooks.ibm.com/redbooks/pdfs/sg247177.pdf

http://www.longpelaexpertise.com/ezine/HLASMfromJava.php

Screen Shot 2021-02-26 at 1 21 50 PM

Our conclusion from the work was that Java access to packed data types is pretty complex; using MR95 to convert allows legacy applications to use packed data types, without requiring the complexity in the Java /Spark works, and obviating the need for conversion of the event and reference data.

SaPeresi commented 3 years ago

For next week we want to take a look at this item. https://www.ibm.com/support/pages/jzos-apis

neilbeesley commented 3 years ago

This short Scala example shows reading Virginia Agency file on z/OS (EBCDIC data) using JZOS under Spark.

import org.apache.spark.sql.SparkSession import com.ibm.jzos._ val filePaths = Map("inputPath" -> "'GEBT.SPK.", "outputPath" -> "'GEBT.SPK.") var agencyFileX: RecordReader = null agencyFileX = RecordReader.newReader(ZFile.getSlashSlashQuotedDSN(filePaths("inputPath") + "VAREF.AGENCY.FIXLDATA'"), ZFileConstants.FLAG_DISP_SHR) var lenAgencyRead: Int = 0 val agencyLine = new Array[Byte](agencyFileX.getLrecl) lenAgencyRead = agencyFileX.read(agencyLine) val agencyLineStrValue = new String(agencyLine, 0, lenAgencyRead, ZUtil.getDefaultPlatformEncoding)

The command issued in Spark:

:LOAD agency.scala

Alternatively the following command can be issued on entry to Spark:

spark-shell -i agency.scala

On IBM Kanplex SP12 the following prerequisite profile commands were issued:

NBEESLE:/Spark/bin>. /ngsafr/safrbld.profile /Spark/bin> . /ngsafr/.profile /Spark/bin> . load-spark-env.sh

neilbeesley commented 3 years ago

We converted the Scala example we used from EBCDIC codepage to ASCII using the iconv command in USS. The command to change the codepage tag of a file is chgtag. For example:

iconv -f IBM-1047 -t ISO8859-1 agency_rdd.scala > agency_rddb.scala

chtag -t -c ISO8859-1 agency_rddb.scala

The command to display the codepage and taking of files is ls -T.

KipTwitchell commented 3 years ago

Today's work (3/12/21) included the following:

demonstrated that executing the two bottom commands of our sample program sequentially read the file
We also read a different file GEBT.SPK.VAREF.SOURCE.FIXLDATA (also copied the program to make the change to read this file, which included using the ChgTag command against the new program name)
Discussed the differences between record level IO and a Spark RDD, which is a full file access.
Attempted to use the ReadFile java command instead of the RecordRead command
Located the jzos jar on KANPLEX at Java/J8.0_64/lib/ext/ibmjzos.jar

Suggestions for next week include:

Figure out how to execute the Jar Statement to inspect the jzos.jar file.
Get jzos working in an IDE which will accomplish the same purpose, to give syntax checking to our jzos code.
Attempt to write code which creates an RDD from a jzos read.
Look at an RDD video link

KipTwitchell commented 3 years ago

Results from Mar 19:

Consider posting NGSAFR result PPT and video to GenevaERS site
Look at USS Profile to simplify ascii code for execution in Spark
Perhaps consider ListBuffer memory structure for our while loop to assign values to RDD
Work with Sandy on the Jira posting of these tasks.

KipTwitchell commented 3 years ago

Apr. 9th Progress.

Creating an RDD from a parallelized collection can be done with this command:
val dataRDD=spark.sparkContext.parallelize(Seq(("sun",01),("mon",02),("tue",03), ("wed",04),("thus",05))) The "Seq" can be replaced by the collection that came back from jzos commend.

Then Spark Functions like the following could be used again the resulting RDD: dataRDD.count() // Number of items in this Dataset res0: Long = 126 // May be different from yours as README.md will change over time, similar to other outputs

dataRDD.first() // First item in this Dataset res1: String = # Apache Spark

neilbeesley commented 3 years ago

Problem encountered using context:

val data=spark.sparkContext.parallelize(agencyLine) ----------^

Noticed related error message in spark start-up:

/Spark/bin> spark-shell Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). 21/04/09 13:31:49 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using built in-java classes where applicable 21/04/09 13:31:50 ERROR SparkContext: Error initializing SparkContext. java.io.FileNotFoundException: /etc/spark/events/local-1618000310607.inprogress (EDC5111I Permission denied.) at java.io.FileOutputStream.open(FileOutputStream.java:286) at java.io.FileOutputStream.(FileOutputStream.java:226) at java.io.FileOutputStream.(FileOutputStream.java:112) at org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:114) at org.apache.spark.SparkContext.(SparkContext.scala:516) at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2258) at org.apache.spark.sql.SparkSession$Builder$$anonfun$8.apply(SparkSession.scala:831) at org.apache.spark.sql.SparkSession$Builder$$anonfun$8.apply(SparkSession.scala:823) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:823) at org.apache.spark.repl.Main$.createSparkSession(Main.scala:95) at $line3.$read$$iw$$iw.(:15) at $line3.$read$$iw.(:31) at $line3.$read.(:33) at $line3.$read$.(:37) at $line3.$read$.() at $line3.$eval$.$print$lzycompute(:7) at $line3.$eval$.$print(:6) at $line3.$eval.$print() at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:90) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55) at java.lang.reflect.Method.invoke(Method.java:508) at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:786) at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1047) at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:638) at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:637) at scala.reflect.internal.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31) at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:19) at scala.tools.nsc.interpreter.IMain$WrappedRequest.loadAndRunReq(IMain.scala:637) at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:569) at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:565) at scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:807) at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:681) at scala.tools.nsc.interpreter.ILoop.processLine(ILoop.scala:395) at org.apache.spark.repl.SparkILoop$$anonfun$initializeSpark$1.apply$mcV$sp(SparkILoop.scala:38) at org.apache.spark.repl.SparkILoop$$anonfun$initializeSpark$1.apply(SparkILoop.scala:37) at org.apache.spark.repl.SparkILoop$$anonfun$initializeSpark$1.apply(SparkILoop.scala:37) at scala.tools.nsc.interpreter.IMain.beQuietDuring(IMain.scala:214) at org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:37) at org.apache.spark.repl.SparkILoop.loadFiles(SparkILoop.scala:94) at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply$mcZ$sp(ILoop.scala:920) at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:909) at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:909) at scala.reflect.internal.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:97) at scala.tools.nsc.interpreter.ILoop.process(ILoop.scala:909) at org.apache.spark.repl.Main$.doMain(Main.scala:68) at org.apache.spark.repl.Main$.main(Main.scala:51) at org.apache.spark.repl.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:90) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55) at java.lang.reflect.Method.invoke(Method.java:508) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:736) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) java.io.FileNotFoundException: /etc/spark/events/local-1618000310607.inprogress (EDC5111I Permission denied.) at java.io.FileOutputStream.open(FileOutputStream.java:286) at java.io.FileOutputStream.(FileOutputStream.java:226) at java.io.FileOutputStream.(FileOutputStream.java:112) at org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:114) at org.apache.spark.SparkContext.(SparkContext.scala:516) at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2258) at org.apache.spark.sql.SparkSession$Builder$$anonfun$8.apply(SparkSession.scala:831) at org.apache.spark.sql.SparkSession$Builder$$anonfun$8.apply(SparkSession.scala:823) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:823) at org.apache.spark.repl.Main$.createSparkSession(Main.scala:95) ... 47 elided

:14: error: not found: value spark import spark.implicits._ ^ :14: error: not found: value spark import spark.sql ^ Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.0.2 /_/ Using Scala version 2.11.8 (IBM J9 VM, Java 1.8.0_261) Type in expressions to have them evaluated. Type :help for more information.

genevaers / Research-And-Development

Spark POC Progress #19