Open illinineverdie opened 4 years ago
When network isolation mode is enabled (enable_network_isolation=True
) the processing service will block all network egress from the processing containers (and your Spark application). This means your Spark app will not be able to do direct I/O to S3, and would only be able to do I/O to the local EBS volumes for the job.
For context, the network isolation feature is often used as a one-click solution for strong protection against data exfiltration risks, but is not required to restrict traffic to S3 within your VPC (only the network configuration with your VPC subnet/security groups is required for this).
If you do require network isolation, your input/output data will have to be staged on the job's local EBS volumes by specifying a ProcessingInput
(https://sagemaker.readthedocs.io/en/stable/api/training/processing.html#sagemaker.processing.ProcessingInput) and a ProcessingOutput
(https://sagemaker.readthedocs.io/en/stable/api/training/processing.html#sagemaker.processing.ProcessingOutput) in your job configuration.
I am curious if there are issues calling boto3 client from AppMaster when the network is isolated. Or, maybe my network config is off... not sure. I am running pyspark script using the PySparkProcessor (injected into the interface). That script needs to pull objects from s3 as it runs in the AppMaster\Client before working with spark session across the slave workers. I have network isolation turned on, and security groups set.. I have allowed traffic to s3.
When I turn network isolation I get the following. "raise NoCredentialsError botocore.exceptions.NoCredentialsError: Unable to locate credentials"
This traces back to this line of code in my pyspark script I inject. s3 = boto3.client('s3')
I am using the same role my sagemaker notebook is running in.. that allows me to make these calls to boto3. I simply pass that role to. networkConfig = NetworkConfig(enable_network_isolation=True, security_group_ids=[sg_s3_access, sg_master, sg_slaves], subnets=[private_subnet_3]) role = sagemaker.get_execution_role()
spark_processor = PySparkProcessor(base_job_name="some-job", role=role, instance_count=2, instance_type="ml.m5.4xlarge", max_runtime_in_seconds=2400, network_config=networkConfig, image_uri="............dkr.ecr.us-east-1.amazonaws.com/sagemaker-spark-processing:2.4-cpu-py37-v1.0")
All works ok when "enable_network_isolation=False" and I still pass in my networkconfig. Is there a defect in calling boto3 from PySpark script from AppMaster when network isolation is turned on? Or, should I look at my network config again?