Closed atzev closed 1 year ago
Hi @atzev, I can confirm this behavior. I will pass this information internally to see if we can pick up this task and improve the user experience when using ML Studio from data.all
Hello Team, do we have any fix or workaround for this issue?
A colleague has done some investigation and has a explanation for the issue. Basically in the CDK code, the SageMaker Domain is defined with “VPCOnly” together with the default VPC (which has only public subnets): https://github.com/awslabs/aws-dataall/blob/main/backend/dataall/cdkproxy/stacks/environment.py#L581 SageMaker Domain doesn’t support this (VPCOnly + public subnet): Doc link stating the requirements (no 1.) that when using VPCOnly mode the Subnets must be private All the subnets available in the chosen VPC are used to create the Domain (might include a mix of public, private and isolated - probably we just want private and isolated and fail the deployment otherwise if the list is empty) https://github.com/awslabs/aws-dataall/blob/main/backend/dataall/cdkproxy/stacks/environment.py#L555 In other words, it assumes that the environment account has a default VPC with Private Subnets. From my tests (I created 2 environment accounts from scratch in Isengard) the default VPC does not have private subnets by default, so you have to manually modify the VPC to create them.
A snippet of code that would work is to create the right VPC for SageMaker Studio with VPCOnly mode:
# LINE 553
# try: # COMMENTING THIS PIECE WHERE WE DO NOT WANT TO USE A DEFAULT VPC
# default_vpc = ec2.Vpc.from_lookup(self, 'VPCStudio', is_default=True)
# vpc_id = default_vpc.vpc_id
# subnet_ids = [private_subnet.subnet_id for private_subnet in default_vpc.private_subnets]
# subnet_ids += [public_subnet.subnet_id for public_subnet in default_vpc.public_subnets]
# subnet_ids += [isolated_subnet.subnet_id for isolated_subnet in default_vpc.isolated_subnets]
# except Exception as e:
# logger.error(f"Default VPC not found, Exception: {e}. If you don't own a default VPC, modify the networking configuration, or disable ML Studio upon environment creation.")
# Create VPC with 3 Public Subnets and 3 Private subnets wit NAT Gateways
log_group = logs.LogGroup(
self,
f'SageMakerStudio{self._environment.name}',
log_group_name=f'/{self._environment.resourcePrefix}/{self._environment.name}/vpc/sagemakerstudio',
retention=logs.RetentionDays.ONE_MONTH,
removal_policy=RemovalPolicy.DESTROY,
)
vpc_flow_role = iam.Role(
self, 'FlowLog',
assumed_by=iam.ServicePrincipal('vpc-flow-logs.amazonaws.com')
)
vpc = ec2.Vpc(
self,
"VPC",
max_azs=3,
cidr="10.10.0.0/16",
subnet_configuration=[
ec2.SubnetConfiguration(
subnet_type=ec2.SubnetType.PUBLIC,
name="Public",
cidr_mask=24
),
ec2.SubnetConfiguration(
subnet_type=ec2.SubnetType.PRIVATE_WITH_NAT,
name="Private",
cidr_mask=24
),
],
)
ec2.FlowLog(
self, "StudioVPCFlowLog",
resource_type=ec2.FlowLogResourceType.from_vpc(vpc),
destination=ec2.FlowLogDestination.to_cloud_watch_logs(log_group, vpc_flow_role)
)
vpc_id = vpc.vpc_id
subnet_ids = [private_subnet.subnet_id for private_subnet in vpc.private_subnets]
sg = vpc.vpc_default_security_group
sagemaker_domain = sagemaker.CfnDomain( # this is the same as before from here to the end
Fixed in #409
Describe the bug
As a user, when I open SageMaker Studio from the data.all portal I have to wait several minutes before I can start using Studio.
How to Reproduce
New user:
Returning user:
Expected behavior
For existing users, Studio load within seconds after the use clicks to launch Jupyter Lab icon. For new users, Studio should launch as soon as the app is created.
Your project
No response
Screenshots
No response
OS
Mac
Python version
3.8
AWS data.all version
1.4.1
Additional context
No response