aws-samples / aws-glue-samples

AWS Glue code samples
MIT No Attribution
1.42k stars 812 forks source link

Unable to start Spark-UI docker container from EC2 in China Region #125

Closed hlmiao closed 2 years ago

hlmiao commented 2 years ago

Hi, I am sure the AK/SK is available to access S3 buckets at a prefix level, but the error trace always followed "The AWS Access Key Id you provided does not exist in our records" , maybe less aws-cn in some places?

docker run -it -e SPARK_HISTORY_OPTS="$SPARK_HISTORY_OPTS -Dspark.history.fs.logDirectory=$LOG_DIR -Dspark.hadoop.fs.s3a.access.key=$AWS_ACCESS_KEY_ID -Dspark.hadoop.fs.s3a.secret.key=$AWS_SECRET_ACCESS_KEY" -p 18080:18080 glue/sparkui "/opt/spark/bin/spark-classorg.apache.spark.deploy.history.HistoryServer"

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
22/05/20 03:53:18 INFO HistoryServer: Started daemon with process name: 1@a7761f804132
22/05/20 03:53:18 INFO SignalUtils: Registered signal handler for TERM
22/05/20 03:53:18 INFO SignalUtils: Registered signal handler for HUP
22/05/20 03:53:18 INFO SignalUtils: Registered signal handler for INT
22/05/20 03:53:18 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
22/05/20 03:53:18 INFO SecurityManager: Changing view acls to: root
22/05/20 03:53:18 INFO SecurityManager: Changing modify acls to: root
22/05/20 03:53:18 INFO SecurityManager: Changing view acls groups to:
22/05/20 03:53:18 INFO SecurityManager: Changing modify acls groups to:
22/05/20 03:53:18 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
22/05/20 03:53:18 INFO FsHistoryProvider: History server ui acls disabled; users with admin permissions: ; groups with admin permissions
Exception in thread "main" java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.spark.deploy.history.HistoryServer$.main(HistoryServer.scala:280)
        at org.apache.spark.deploy.history.HistoryServer.main(HistoryServer.scala)
Caused by: java.nio.file.AccessDeniedException: : getFileStatus on : com.amazonaws.services.s3.model.AmazonS3Exception: The AWS Access Key Id you provided does not exist in our records. (Service: Amazon S3; Status Code: 403; Error Code: InvalidAccessKeyId; Request ID: VF402B2FDAM7MD6H; S3 Extended Request ID: cZxivSLBFHKefii6s9MXBvpI7GNM7t2cUghL3f+pdP9oIvkMoLRwaLOGiAObXe/9bQeyKiIr430=), S3 Extended Request ID: cZxivSLBFHKefii6s9MXBvpI7GNM7t2cUghL3f+pdP9oIvkMoLRwaLOGiAObXe/9bQeyKiIr430=
        at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:158)
        at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1635)
        at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:117)
        at org.apache.spark.deploy.history.FsHistoryProvider.org$apache$spark$deploy$history$FsHistoryProvider$$startPolling(FsHistoryProvider.scala:257)
        at org.apache.spark.deploy.history.FsHistoryProvider.initialize(FsHistoryProvider.scala:211)
        at org.apache.spark.deploy.history.FsHistoryProvider.<init>(FsHistoryProvider.scala:207)
        at org.apache.spark.deploy.history.FsHistoryProvider.<init>(FsHistoryProvider.scala:86)
        ... 6 more
Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: The AWS Access Key Id you provided does not exist in our records. (Service: Amazon S3; Status Code: 403; Error Code: InvalidAccessKeyId; Request ID: VF402B2FDAM7MD6H; S3 Extended Request ID: cZxivSLBFHKefii6s9MXBvpI7GNM7t2cUghL3f+pdP9oIvkMoLRwaLOGiAObXe/9bQeyKiIr430=), S3 Extended Request ID: cZxivSLBFHKefii6s9MXBvpI7GNM7t2cUghL3f+pdP9oIvkMoLRwaLOGiAObXe/9bQeyKiIr430=
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1712)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1367)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1113)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:770)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:744)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:726)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:686)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:668)
        at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:532)
        at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:512)
        at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5052)
        at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4998)
        at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4992)
        at com.amazonaws.services.s3.AmazonS3Client.listObjects(AmazonS3Client.java:895)
        at org.apache.hadoop.fs.s3a.S3AFileSystem.listObjects(S3AFileSystem.java:918)
        at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1611)
xmubeta commented 2 years ago

Please try to add the following option:

-Dspark.hadoop.fs.s3a.endpoint=s3.cn-north-1.amazonaws.com.cn

If you use Ningxia region, use cn-northwest-1 in the endpoint.

moomindani commented 2 years ago

Yes, as you can see in the README, you need to set spark.hadoop.fs.s3a.endpoint.

Based on xmubeta's comment, I added the examples to README. Feel free to reopen this issue if the issue still occurs.

purnima1612 commented 1 year ago

Do we need to include Dspark.hadoop.fs.s3a.endpoint for all regions?

xmubeta commented 1 year ago

You only need to use this option for China region (Beijing & Ningxia).

purnima1612 commented 1 year ago

I am also getting this issue. https://github.com/aws-samples/aws-glue-samples/issues/140

hlmiao commented 1 year ago

So far it is only for China Region.


Regards HanLin

At 2023-02-23 07:06:05, "purnima1612" @.***> wrote:

Do we need to include Dspark.hadoop.fs.s3a.endpoint for all regions?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>