Closed kyprifog closed 3 years ago
@jahstreet reopening because this is still an issue for me.
@jahstreet Do you have any ideas on this error? I verified that hadoop-aws-3.2.0 was indeed included in the livy image that you had built, but I am having a really tough time diagnosing this particular error. Are you able to show s3 access from the most recent version?
Here is the scala version of the error:
java.nio.file.AccessDeniedException: s3a://nyc-tlc/misc/uber_nyc_data.csv: getFileStatus on s3a://nyc-tlc/misc/uber_nyc_data.csv: com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: 32ED94CC70510060; S3 Extended Request ID: FuZ1ybBdZzDIY5Vm8mQYBtdYzg63nva1MOwYK+wQQngI+DL57hqCC0ctYwUqCnb6NdDJ7J/1og8=), S3 Extended Request ID: FuZ1ybBdZzDIY5Vm8mQYBtdYzg63nva1MOwYK+wQQngI+DL57hqCC0ctYwUqCnb6NdDJ7J/1og8=:403 Forbidden
at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:230)
at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:151)
at org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2198)
at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2163)
at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2102)
at org.apache.hadoop.fs.FileSystem.isDirectory(FileSystem.java:1700)
at org.apache.hadoop.fs.s3a.S3AFileSystem.isDirectory(S3AFileSystem.java:2995)
at org.apache.spark.sql.execution.streaming.FileStreamSink$.hasMetadata(FileStreamSink.scala:47)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:366)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:297)
at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:286)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:286)
at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:723)
at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:553)
... 51 elided
Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: 32ED94CC70510060; S3 Extended Request ID: FuZ1ybBdZzDIY5Vm8mQYBtdYzg63nva1MOwYK+wQQngI+DL57hqCC0ctYwUqCnb6NdDJ7J/1og8=)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1640)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1304)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1058)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:743)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667)
at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4368)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4315)
at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1271)
at org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$getObjectMetadata$4(S3AFileSystem.java:1249)
at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:322)
at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:285)
at org.apache.hadoop.fs.s3a.S3AFileSystem.getObjectMetadata(S3AFileSystem.java:1246)
at org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2183)
... 63 more
If I go in and manually add the AWS keys to the environ it seems to fix it so the issue appears to be some disconnect with how hadoop-aws and spark 3.0 are supposed to forward keys.
This says that spark-submit
normally is responsible for forwarding the keys so this makes me think that the corresponding change needs to happen on the livy side: https://spark.apache.org/docs/latest/cloud-integration.html#authenticating
I may have found a work around by just setting:
"spark.executorEnv.AWS_SECRET_ACCESS_KEY"
"spark.executorEnv.AWS_ACCESS_KEY_ID"
in the spark conf
It turns out after much digging that forwarding via the spark conf in the usual way actually works as expected:
val spark = SparkSession.builder.
.config("fs.s3a.secret.key", "TEMP_SECRET")
.config("fs.s3a.access.key", "TEMP_KEY")
.config("fs.s3a.session.token", "TEMP_SESSION")
.config("fs.s3a.aws.credentials.provider", "org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider")
.getOrCreate
There were a couple of other confounding issues causing it to error in my case. Closing issue.
On upgrading to 2.0.1 I can no longer leverage hadoop-aws and get this somewhat cryptic error:
querying this public bucket:
How do I configure hadoop-aws like version 1.0.0? This seemed to be working then.