Closed ottobricks closed 3 years ago
Hi @ottok92 the S3AFileSystem is in the hadoop-aws
library, it is not part of the AWS SDK for Java, so we cannot help much with the issue.
According to the jira ticket you referenced, the issue was fixed in hadoop-aws:3.2.2
which is the version you're using, so I would check in your environment if the dependency is not being resolved to another version. For further questions, please reach out to hadoop support.
Comments on closed issues are hard for our team to see. If you need more assistance, please open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.
Hi, Got the same issue with Spark 3.2.0 when I switched to the Magic committer. The possible solution doesn't work because it's already in the spark-Hadoop-cloud package. Any ideas?
I'm currently having the same issue with SemaphoredDelegatingExecutor
. Spark 3.2.2 supposedly solves it, but EMR does not support this version. @ottok92 were you able to find any solutions for this?
I did fix the issue but it was so long ago that I don't remember exactly what it was.
The problem is not with Spark, it's with hadoop-aws. I suggest downloading the jars to S3 for the supported version and pointing to the with --jars
. The necessary jars are:
Note that for EMR 6.5.0, supported versions are:
For me the issue disappeared when I changed path to the bucket from 's3a://...' to 's3://...'
Hi
Yes, but when you change s3a to s3 it means you are using s3 without the driver. Anyway, we got EMR version 6.7.0 yesterday. You just need to update EMR to this version and all will work! 🙂
Sincerely Roman Royzman Undertone | Data Engineer Tech Lead T +972-73-3981909 F +972-73-3982379
[http://mail-image.perion.com/Rebranding/Undertone/UndertoneLogo.png] [http://mail-image.perion.com/Rebranding/Undertone/IconsForUTLogo.png]
From: Aliaksandr Kastenka @.> Sent: Thursday, July 7, 2022 11:57 AM To: aws/aws-sdk-java @.> Cc: Roman Royzman | Undertone @.>; Comment @.> Subject: Re: [aws/aws-sdk-java] NoSuchMethodError: SemaphoredDelegatingExecutor while writing files to S3 (#2510)
For me the issue disappeared when I changed path to the bucket from 's3a://...' to 's3://...'
— Reply to this email directly, view it on GitHubhttps://github.com/aws/aws-sdk-java/issues/2510#issuecomment-1177276946, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AF237GRKOXNE2HKA4G5UL63VS2LZBANCNFSM4XZL3IAA. You are receiving this because you commented.Message ID: @.***>
Hi Yes, but when you change s3a to s3 it means you are using s3 without the driver. Anyway, we got EMR version 6.7.0 yesterday. You just need to update EMR to this version and all will work! 🙂 Sincerely Roman Royzman Undertone | Data Engineer Tech Lead T +972-73-3981909 F +972-73-3982379 [http://mail-image.perion.com/Rebranding/Undertone/UndertoneLogo.png] [http://mail-image.perion.com/Rebranding/Undertone/IconsForUTLogo.png] … ____ From: Aliaksandr Kastenka @.> Sent: Thursday, July 7, 2022 11:57 AM To: aws/aws-sdk-java @.> Cc: Roman Royzman | Undertone @.>; Comment @.> Subject: Re: [aws/aws-sdk-java] NoSuchMethodError: SemaphoredDelegatingExecutor while writing files to S3 (#2510) For me the issue disappeared when I changed path to the bucket from 's3a://...' to 's3://...' — Reply to this email directly, view it on GitHub<#2510 (comment)>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AF237GRKOXNE2HKA4G5UL63VS2LZBANCNFSM4XZL3IAA. You are receiving this because you commented.Message ID: @.***>
YOU ARE WRONG: https://stackoverflow.com/a/71571625 https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-file-systems.html "Previously, Amazon EMR used the s3n and s3a file systems. While both still work, we recommend that you use the s3 URI scheme for the best performance, security, and reliability."
Hi Yes, but when you change s3a to s3 it means you are using s3 without the driver. Anyway, we got EMR version 6.7.0 yesterday. You just need to update EMR to this version and all will work! 🙂 Sincerely Roman Royzman Undertone | Data Engineer Tech Lead T +972-73-3981909 F +972-73-3982379 [http://mail-image.perion.com/Rebranding/Undertone/UndertoneLogo.png] [http://mail-image.perion.com/Rebranding/Undertone/IconsForUTLogo.png] …https://outlook.office.com/mail/inbox/id/AAQkADdkOTkxNjE1LWIzYjctNDVlOC04ZmM2LTA3MzJjYmIwNDk0OQAQAJ5AVXUMvhtKsp%2BOuPUyGgg%3D# ____ From: Aliaksandr Kastenka @.> Sent: Thursday, July 7, 2022 11:57 AM To: aws/aws-sdk-java @.> Cc: Roman Royzman | Undertone @.>; Comment @.> Subject: Re: [aws/aws-sdk-java] NoSuchMethodError: SemaphoredDelegatingExecutor while writing files to S3 (#2510https://github.com/aws/aws-sdk-java/issues/2510) For me the issue disappeared when I changed path to the bucket from 's3a://...' to 's3://...' — Reply to this email directly, view it on GitHub<#2510 (comment)https://github.com/aws/aws-sdk-java/issues/2510#issuecomment-1177276946>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AF237GRKOXNE2HKA4G5UL63VS2LZBANCNFSM4XZL3IAA. You are receiving this because you commented.Message ID: @.***>
YOU ARE WRONG: https://stackoverflow.com/a/71571625 https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-file-systems.html "Previously, Amazon EMR used the s3n and s3a file systems. While both still work, we recommend that you use the s3 URI scheme for the best performance, security, and reliability."
Actually Nope! You are WRONG! We are working with Spark. The s3a is essential for the magic committer!!!!! It won't work with S3 !!!!!!!!!!!
Sincerely Roman Royzman Undertone | Data Engineer Tech Lead T +972-73-3981909 F +972-73-3982379
[http://mail-image.perion.com/Rebranding/Undertone/UndertoneLogo.png] [http://mail-image.perion.com/Rebranding/Undertone/IconsForUTLogo.png]
From: rjy7wb @.> Sent: Friday, July 22, 2022 4:15 AM To: aws/aws-sdk-java @.> Cc: Roman Royzman | Undertone @.>; Comment @.> Subject: Re: [aws/aws-sdk-java] NoSuchMethodError: SemaphoredDelegatingExecutor while writing files to S3 (#2510)
Hi Yes, but when you change s3a to s3 it means you are using s3 without the driver. Anyway, we got EMR version 6.7.0 yesterday. You just need to update EMR to this version and all will work! 🙂 Sincerely Roman Royzman Undertone | Data Engineer Tech Lead T +972-73-3981909 F +972-73-3982379 [http://mail-image.perion.com/Rebranding/Undertone/UndertoneLogo.png] [http://mail-image.perion.com/Rebranding/Undertone/IconsForUTLogo.png] … ____ From: Aliaksandr Kastenka @.> Sent: Thursday, July 7, 2022 11:57 AM To: aws/aws-sdk-java @.> Cc: Roman Royzman | Undertone @.>; Comment @.> Subject: Re: [aws/aws-sdk-java] NoSuchMethodError: SemaphoredDelegatingExecutor while writing files to S3 (#2510https://github.com/aws/aws-sdk-java/issues/2510) For me the issue disappeared when I changed path to the bucket from 's3a://...' to 's3://...' — Reply to this email directly, view it on GitHub<#2510 (comment)https://github.com/aws/aws-sdk-java/issues/2510#issuecomment-1177276946>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AF237GRKOXNE2HKA4G5UL63VS2LZBANCNFSM4XZL3IAA. You are receiving this because you commented.Message ID: @.***>
YOU ARE WRONG: https://stackoverflow.com/a/71571625 https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-file-systems.html "Previously, Amazon EMR used the s3n and s3a file systems. While both still work, we recommend that you use the s3 URI scheme for the best performance, security, and reliability."
— Reply to this email directly, view it on GitHubhttps://github.com/aws/aws-sdk-java/issues/2510#issuecomment-1192081150, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AF237GSGTAXZG5OEOV22BQLVVHY2VANCNFSM4XZL3IAA. You are receiving this because you commented.Message ID: @.***>
Hi Yes, but when you change s3a to s3 it means you are using s3 without the driver. Anyway, we got EMR version 6.7.0 yesterday. You just need to update EMR to this version and all will work! 🙂 Sincerely Roman Royzman Undertone | Data Engineer Tech Lead T +972-73-3981909 F +972-73-3982379 [http://mail-image.perion.com/Rebranding/Undertone/UndertoneLogo.png] [http://mail-image.perion.com/Rebranding/Undertone/IconsForUTLogo.png] … ____ From: Aliaksandr Kastenka @.> Sent: Thursday, July 7, 2022 11:57 AM To: aws/aws-sdk-java @.> Cc: Roman Royzman | Undertone @.>; Comment @.> Subject: Re: [aws/aws-sdk-java] NoSuchMethodError: SemaphoredDelegatingExecutor while writing files to S3 (#2510) For me the issue disappeared when I changed path to the bucket from 's3a://...' to 's3://...' — Reply to this email directly, view it on GitHub<#2510 (comment)>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AF237GRKOXNE2HKA4G5UL63VS2LZBANCNFSM4XZL3IAA. You are receiving this because you commented.Message ID: @.***>
YOU ARE WRONG: https://stackoverflow.com/a/71571625 https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-file-systems.html "Previously, Amazon EMR used the s3n and s3a file systems. While both still work, we recommend that you use the s3 URI scheme for the best performance, security, and reliability."
Actually Nope! You are WRONG! We are working with Spark. The s3a is essential for the magic committer!!!!! It won't work with S3 !!!!!!!!!!!
Hi VIctor
How are u doing? You just need to add the script to the Bootstrap section when you spin up your cluster, this one -> spark-patch-s3a-fix_emr-6.6.0.sh =) Amazon provided this fix only for EMR 6.6.0. It's related to the s3a driver. If you use s3 without a driver then it should work!
Sincerely Roman Royzman Undertone | Data Engineer Tech Lead T +972-73-3981909 F +972-73-3982379
[http://mail-image.perion.com/Rebranding/Undertone/UndertoneLogo.png] [http://mail-image.perion.com/Rebranding/Undertone/IconsForUTLogo.png]
From: Victor Valente @.> Sent: Wednesday, June 29, 2022 5:42 PM To: aws/aws-sdk-java @.> Cc: Roman Royzman | Undertone @.>; Comment @.> Subject: Re: [aws/aws-sdk-java] NoSuchMethodError: SemaphoredDelegatingExecutor while writing files to S3 (#2510)
I'm currently having the same issuehttps://stackoverflow.com/questions/72794805/is-it-possible-to-use-a-custom-hadoop-version-with-emr/72800621#72800621 with SemaphoredDelegatingExecutor. Spark 3.2.2 supposedly solves it, but EMR does not support this version. @ottok92https://github.com/ottok92 were you able to find any solutions for this?
— Reply to this email directly, view it on GitHubhttps://github.com/aws/aws-sdk-java/issues/2510#issuecomment-1170071779, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AF237GWPKXWU7XVKF3AL2ITVRROFFANCNFSM4XZL3IAA. You are receiving this because you commented.Message ID: @.***>
Issue writing to AWS S3 via the aws-java-sdk in spark context
Describe the bug
For a given DataFrame df in a PySpark env, the operation
df.write.parquet("s3a://some-bucket/test.parquet")
starts nicely but fails once concurrency occurs and the SDK callsorg.apache.hadoop.util.SemaphoredDelegatingExecutor
, which returns ajava.lang.NoSuchMethodError
Expected Behavior
Ideally, the files should be written without any issues.
Current Behavior
The write operation fails with the following stack trace in the worker:
The relevant part for this issue is:
This issue has been reported in Apache's JIRA HADOOP-16080
Steps to Reproduce
Environment: pyspark 3.0.1 with hadoop 3.2 and aws-java-sdk 1.11.95x Provided you have already set up your spark context, assigned in this example to
sc
, the code to reproduce the error is:Possible Solution
From HADOOP-16080:
Context
This issue has been impacting all my workflow when it comes to saving DataFrames to S3
Your Environment
Function I use to set up my spark env locally: