Open dhana-sekhar opened 2 months ago
Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.
Looking at the API docs [here], there isn't any way to set "distributed mode".
But it does have this:
S3DataDistributionType (string) –
Whether to distribute the data from Amazon S3 to all processing instances with FullyReplicated, or whether the data from Amazon S3 is shared by Amazon S3 key, downloading one shard of data to each processing instance.
so setting either ProcessingInputs["S3Input"]["S3DataDistributionType"]
or ProcessingInputs["DatasetDefinition"]["DataDistributionType"]
to "ShardedByS3Key" in the config may get the result you are looking for? But you aren't using any ProcessingInputs in the config at all, so I'm not sure how this works.
This issue has been automatically marked as stale because it has been open for 14 days with no response from the author. It will be closed in next 7 days if no further activity occurs from the issue author.
Apache Airflow version
Other Airflow 2 version (please specify below)
If "Other Airflow 2 version" selected, which one?
2.4.3
What happened?
we are trying to create a cluster with SageMakerProcessingOperator with "InstanceCount":2 and passing a custom docker image where it have my spark code. Now when I try to run the spark code in the container on this cluster, my spark code is not running on distributed way but I can see from cloudwatch that sagemaker is able to spin-up 2 instances.
What you think should happen instead?
when I run my code with the SageMakerProcessingOperator with "InstanceCount":2 it should run in distributed mode
How to reproduce
you can use this code and take any docker image with spark installed. run some pyspark code witin the container with 2 or more InstanceCount.
Operating System
Amazon Linux
Versions of Apache Airflow Providers
MWAA version 2.4.3
Deployment
Amazon (AWS) MWAA
Deployment details
No response
Anything else?
No response
Are you willing to submit PR?
Code of Conduct