Closed ssdag111 closed 4 years ago
@ssdag111 you need to use pattern of the form cos://<yourbucket>.<service name>/<rest of path>
. so based on the configuration you used, the patten would be
abj = "cos://name_of_bucket.mycos/1.txt"
also the problem you have is not Stocator related, but general usage how to add external jar to Spark. Seems Stocator jar is not loaded for some reason. Please double check this
Thank you for your response. Okay I'll try out what you suggested here but what is a sureshot way of loading the jar? I cd into the directory where it is located and then run pyspark from there. I could also give the full path to the JAR file if needed..
I just tried it out.. nothing..
abjj = "cos://test-bucket.mycos/1.txt"
leads to same error as before..
How do I check that the stocator jar is not getting loaded? I mean is there some means to check for this within python itself (display versions etc.)?
@ssdag111 i suggest you to see in Spark documentation how to load jars. It's not Stocator related issue...pure Spark + how to load external jar. May be your path is wrong to jar, may be you need to use some configuration in Spark to point to jar, may be some other command to load jar, etc..
I just added an external jar by stopping the spark context and then restarting with a new one with the JAR loaded. Still nothing..
sc.stop()
conf = SparkConf().set("spark.jars", "/path to/stocator-1.0.36-SNAPSHOT-ibm-sdk/target/stocator-1.0.36-SNAPSHOT-IBM-SDK.jar")
sc = SparkContext( conf=conf)
@ssdag111 your exception is above stocator...you need to verify all steps to make sure you properly include jar into Spark. Are you sure the jar at the location you specified?
Yes.. I even cd'd into the directory where it is located and then started spark but to no avail..
Hi, how was it solved? I'm facing the same issue.
Ok, I managed to address it. Here's a snippet as an example that works (in case someone runs into the same issue): https://github.com/bambrozio/snippets/blob/master/cloud/icos/pyspark-icos-stocator.py
So I'm trying to use stocator with pyspark (for reading and writing files to a bucket on IBM cloud storage) but am running into some issues on a linux system.
Versions for everything: stocator: stocator-1.0.36 (IBM-SDK) pyspark: spark-2.4.4 Python: Python 3.7.4
I run pyspark with the following commands:
pyspark --jars stocator-1.0.36-SNAPSHOT-IBM-SDK.jar
orpyspark --jars /_path to full location of_ /stocator-1.0.36-SNAPSHOT-IBM-SDK.jar
I've added the following entries in spark-defaults.xml within the spark-2.4.4 folder in /opt/I've also created a core-sites.xml file in the same location as:
As per the instructions here: https://developer.ibm.com/code/2018/08/16/installing-running-stocator-apache-spark-ibm-cloud-object-storage/
I want to use the "cos://" paradigm as highlighted in the post above and for the given test code below:
I get an error
I also tried the following "swift://" instead of "cos://" and I get:
For "swift2d://" I get:
For "s3://" and "s3d://" I get:
Can someone please help me use stocator with pyspark?