genevaers / Research-And-Development

Apache License 2.0
2 stars 2 forks source link

Spark POC Progress #19

Open KipTwitchell opened 3 years ago

KipTwitchell commented 3 years ago

On Feb. 25 we reviewed the following web pages Specifically section 3. http://www.redbooks.ibm.com/redbooks/pdfs/sg247177.pdf

http://www.longpelaexpertise.com/ezine/HLASMfromJava.php

Screen Shot 2021-02-26 at 1 21 50 PM

Our conclusion from the work was that Java access to packed data types is pretty complex; using MR95 to convert allows legacy applications to use packed data types, without requiring the complexity in the Java /Spark works, and obviating the need for conversion of the event and reference data.

SaPeresi commented 3 years ago

For next week we want to take a look at this item. https://www.ibm.com/support/pages/jzos-apis

neilbeesley commented 3 years ago

This short Scala example shows reading Virginia Agency file on z/OS (EBCDIC data) using JZOS under Spark.

import org.apache.spark.sql.SparkSession import com.ibm.jzos._ val filePaths = Map("inputPath" -> "'GEBT.SPK.", "outputPath" -> "'GEBT.SPK.") var agencyFileX: RecordReader = null agencyFileX = RecordReader.newReader(ZFile.getSlashSlashQuotedDSN(filePaths("inputPath") + "VAREF.AGENCY.FIXLDATA'"), ZFileConstants.FLAG_DISP_SHR) var lenAgencyRead: Int = 0 val agencyLine = new Array[Byte](agencyFileX.getLrecl) lenAgencyRead = agencyFileX.read(agencyLine) val agencyLineStrValue = new String(agencyLine, 0, lenAgencyRead, ZUtil.getDefaultPlatformEncoding)


The command issued in Spark:

:LOAD agency.scala

Alternatively the following command can be issued on entry to Spark:

spark-shell -i agency.scala

On IBM Kanplex SP12 the following prerequisite profile commands were issued:

NBEESLE:/Spark/bin>. /ngsafr/safrbld.profile /Spark/bin> . /ngsafr/.profile /Spark/bin> . load-spark-env.sh

neilbeesley commented 3 years ago

We converted the Scala example we used from EBCDIC codepage to ASCII using the iconv command in USS. The command to change the codepage tag of a file is chgtag. For example:

iconv -f IBM-1047 -t ISO8859-1 agency_rdd.scala > agency_rddb.scala

chtag -t -c ISO8859-1 agency_rddb.scala

The command to display the codepage and taking of files is ls -T.

KipTwitchell commented 3 years ago

Today's work (3/12/21) included the following:

Suggestions for next week include:

KipTwitchell commented 3 years ago

Results from Mar 19:

KipTwitchell commented 3 years ago

Apr. 9th Progress.

Creating an RDD from a parallelized collection can be done with this command:
val dataRDD=spark.sparkContext.parallelize(Seq(("sun",01),("mon",02),("tue",03), ("wed",04),("thus",05))) The "Seq" can be replaced by the collection that came back from jzos commend.

Then Spark Functions like the following could be used again the resulting RDD: dataRDD.count() // Number of items in this Dataset res0: Long = 126 // May be different from yours as README.md will change over time, similar to other outputs

dataRDD.first() // First item in this Dataset res1: String = # Apache Spark

neilbeesley commented 3 years ago

Problem encountered using context:

val data=spark.sparkContext.parallelize(agencyLine) ----------^

Noticed related error message in spark start-up:

/Spark/bin> spark-shell Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). 21/04/09 13:31:49 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using built in-java classes where applicable 21/04/09 13:31:50 ERROR SparkContext: Error initializing SparkContext. java.io.FileNotFoundException: /etc/spark/events/local-1618000310607.inprogress (EDC5111I Permission denied.) at java.io.FileOutputStream.open(FileOutputStream.java:286) at java.io.FileOutputStream.(FileOutputStream.java:226) at java.io.FileOutputStream.(FileOutputStream.java:112) at org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:114) at org.apache.spark.SparkContext.(SparkContext.scala:516) at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2258) at org.apache.spark.sql.SparkSession$Builder$$anonfun$8.apply(SparkSession.scala:831) at org.apache.spark.sql.SparkSession$Builder$$anonfun$8.apply(SparkSession.scala:823) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:823) at org.apache.spark.repl.Main$.createSparkSession(Main.scala:95) at $line3.$read$$iw$$iw.(:15) at $line3.$read$$iw.(:31) at $line3.$read.(:33) at $line3.$read$.(:37) at $line3.$read$.() at $line3.$eval$.$print$lzycompute(:7) at $line3.$eval$.$print(:6) at $line3.$eval.$print() at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:90) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55) at java.lang.reflect.Method.invoke(Method.java:508) at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:786) at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1047) at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:638) at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:637) at scala.reflect.internal.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31) at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:19) at scala.tools.nsc.interpreter.IMain$WrappedRequest.loadAndRunReq(IMain.scala:637) at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:569) at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:565) at scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:807) at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:681) at scala.tools.nsc.interpreter.ILoop.processLine(ILoop.scala:395) at org.apache.spark.repl.SparkILoop$$anonfun$initializeSpark$1.apply$mcV$sp(SparkILoop.scala:38) at org.apache.spark.repl.SparkILoop$$anonfun$initializeSpark$1.apply(SparkILoop.scala:37) at org.apache.spark.repl.SparkILoop$$anonfun$initializeSpark$1.apply(SparkILoop.scala:37) at scala.tools.nsc.interpreter.IMain.beQuietDuring(IMain.scala:214) at org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:37) at org.apache.spark.repl.SparkILoop.loadFiles(SparkILoop.scala:94) at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply$mcZ$sp(ILoop.scala:920) at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:909) at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:909) at scala.reflect.internal.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:97) at scala.tools.nsc.interpreter.ILoop.process(ILoop.scala:909) at org.apache.spark.repl.Main$.doMain(Main.scala:68) at org.apache.spark.repl.Main$.main(Main.scala:51) at org.apache.spark.repl.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:90) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55) at java.lang.reflect.Method.invoke(Method.java:508) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:736) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) java.io.FileNotFoundException: /etc/spark/events/local-1618000310607.inprogress (EDC5111I Permission denied.) at java.io.FileOutputStream.open(FileOutputStream.java:286) at java.io.FileOutputStream.(FileOutputStream.java:226) at java.io.FileOutputStream.(FileOutputStream.java:112) at org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:114) at org.apache.spark.SparkContext.(SparkContext.scala:516) at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2258) at org.apache.spark.sql.SparkSession$Builder$$anonfun$8.apply(SparkSession.scala:831) at org.apache.spark.sql.SparkSession$Builder$$anonfun$8.apply(SparkSession.scala:823) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:823) at org.apache.spark.repl.Main$.createSparkSession(Main.scala:95) ... 47 elided

:14: error: not found: value spark import spark.implicits._ ^ :14: error: not found: value spark import spark.sql ^ Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.0.2 /_/ Using Scala version 2.11.8 (IBM J9 VM, Java 1.8.0_261) Type in expressions to have them evaluated. Type :help for more information.