gettyimages / docker-spark

Docker build for Apache Spark
MIT License
679 stars 369 forks source link

reading from s3 error #57

Closed emptyr1 closed 5 years ago

emptyr1 commented 5 years ago

Running into issues trying to read from aws s3.

%pyspark
df= spark.read.csv("s3://test/muppal/sample.csv")

I get eror:

Py4JJavaError: An error occurred while calling o54.csv.
: org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "s3"
    at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3266)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3286)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:123)
    at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3337)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3305)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:476)
    at org.apache.hadoop.fs.Path.getFileSystem(Path.java:361)
.......

And after I load manually the libraries, I get:

org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: Request Error: java.lang.IllegalStateException: Invalid class name: org.jets3t.service.utils.RestUtils$ConnManagerFactory
  at org.apache.hadoop.fs.s3.Jets3tFileSystemStore.get(Jets3tFileSystemStore.java:175)
  at org.apache.hadoop.fs.s3.Jets3tFileSystemStore.retrieveINode(Jets3tFileSystemStore.java:221)
  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
  at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.base/java.lang.reflect.Method.invoke(Method.java:567)
bryceageno commented 5 years ago

this is due to spark not working with newer versions of java, I am working on a fix.

bryceageno commented 5 years ago

using openjdk8 has resolved this