Azure / bicep-types-az

Bicep type definitions for ARM resources
MIT License
86 stars 27 forks source link

Deployment scripts in vnet to use private endpoints #1954

Open antsok opened 11 months ago

antsok commented 11 months ago

Is your feature request related to a problem? Please describe. With a recent release of deployment scripts using ACI Vnet integration https://learn.microsoft.com/azure/azure-resource-manager/bicep/deployment-script-bicep#access-private-virtual-network, it became possible to ensure the traffic from those scripts can reach private networking resources.

However, the implementation uses service endpoints to communicate with the backing storage account. In many regulated environments, service endpoints would not be allowed.

Another option (in theory) is to use private endpoints, but when attempting the deployment, it currently gives an error if no service endpoint is configured.

Storage account 'xxxxx' has firewall settings enabled which are not supported for deployment scripts. If providing subnets with managed identity, make sure the "Allow Azure services on the trusted services list to access this storage account" is enabled and proper RBAC is set on the given storage. Please refer to https://aka.ms/DeploymentScriptsTroubleshoot for more deployment script information. (Code: DeploymentScriptStorageAccountWithServiceEndpointEnabled)

Describe the solution you'd like ACI should be able to use a private endpoint connection to the backing storage account.

alex-frankel commented 10 months ago

@antsok - can you help clarify the distinction between "Service Endpoints" and "Private Endpoints". Why would service endpoints not be enabled, but it would be ok to enable private endpoints?

Apologies for the lack of context, but I want to make sure we have a thorough understanding of the problem.

antsok commented 10 months ago

@alex-frankel

One reason is that, when using service endpoints the traffic between a consuming service in a VNet and a target PaaS service (Azure Storage in this case) leaves the VNet perimiter, and travels to the target service via the Azure backbone. From compliance perspective some companies have to keep all traffic inside their network security perimeter.

Private endpoints provision NIC into a VNet with a private IP, so the traffic never travels via Internet or Azure backbone.

Another reason is that companies set up enterprise landing zones with hub-spoke network designs when they want to be in control of traffic that enters resources in the application landing zone, and this is done with a centralized firewall/NVA in a hub. With service endpoints it is not possible to enforce traffic to service endpoints to go through central firewall/NVA, and using service endpoints policies is too much management overhead for a central IT team. So service endpoints end up being disabled (with policies) to enforce the usage of private links instead.

Private endpoints are recommended by Microsoft (link) instead of service endpoints.

I hope it helps.

I do understand we need to wait for the ACI team to add ability using Storage via private endpoints as this is currently not available per documentation

DeanJohnsonUK commented 8 months ago

The new ACI Vnet integration also works with Private endpoints (although it isn't yet documented):

https://github.com/Azure/bicep/issues/12068

I've also tested it myself today and it works.

johnlokerse commented 7 months ago

I just came across this issue and I agree with @antsok. I also see some traffic on my blog post via the GitHub issue I created (https://github.com/Azure/bicep/issues/12068). Many environments are regulated and do not allow service endpoints and require a private endpoint. It is possible to use deployment scripts privately through private endpoints currently, but it is not documented at the moment and it needs some extra steps to get it to work.

@alex-frankel The current documentation is focused on service endpoints (https://learn.microsoft.com/en-us/azure/azure-resource-manager/bicep/deployment-script-vnet). I would love to contribute to the documentation to show how to use deployment scripts privately with private endpoints, but I need some guidance on the way of writing. Is there anyone from the team I can contact for some guidance?

htwashere commented 6 months ago

Hello, at my work environment, I am running similar issue where the container instance for the deployment script is timing out. Based on the output, it is trying to pull an image from mcr.microsoft.com. I'm pretty sure that this traffic is blocked somewhere by our firewall but my network admin indicated that he did not see any traffic going out from the container instance subnet. Could anyone tell me what IPs or domains I should whitelist in our network to let this work? I tried adding mcr.microsot.com already but did not seem to work. On the other hand, with a simpler sandbox environment, I am able to run the Deployment Scripts in VNET/private endpoint by following @johnlokerse instructions; thank you John!

DeanJohnsonUK commented 6 months ago

Your network admin is mistaken, it needs to be able to reach the Microsoft Container Registry, but it also needs to be able to connect to its file share on the associated storage account that it is linked to. (perhaps you have an NSG that is blocking the traffic before it gets to the firewall)

htwashere commented 6 months ago

Your network admin is mistaken, it needs to be able to reach the Microsoft Container Registry, but it also needs to be able to connect to its file share on the associated storage account that it is linked to. (perhaps you have an NSG that is blocking the traffic before it gets to the firewall)

Do you mean "Container Images" instead of "Container Registry"? If yes, according to MSFT, the container base images are discovered via docker hub. So I tried to find some traffic that is possibly going to docker hub (hub.docker.com) but did not see anything going that way. As for the fileshare part, since the storage account is part of the same VNET, our FW have default NSG/routes that allow subnets within the same VNET to talk freely, so I don't think that was the problem (but who knows...)

DeanJohnsonUK commented 6 months ago

I would advise turning on NSG flow logs or perhaps VNET flow logs (in beta) and you will be able to use KQL to determine exactly what is coming in and out of the ACI subnet. (tip: set the analytics processing to 10 minutes rather than the default 60 minutes)

DeanJohnsonUK commented 6 months ago

It needs to be able to reach mcr.microsoft.com

htwashere commented 6 months ago

I would advise turning on NSG flow logs or perhaps VNET flow logs (in beta) and you will be able to use KQL to determine exactly what is coming in and out of the ACI subnet. (tip: set the analytics processing to 10 minutes rather than the default 60 minutes)

Having some trouble creating the flow logs in our environment. Once figured out, I will report back on my findings. Thank you.

htwashere commented 6 months ago

I would advise turning on NSG flow logs or perhaps VNET flow logs (in beta) and you will be able to use KQL to determine exactly what is coming in and out of the ACI subnet. (tip: set the analytics processing to 10 minutes rather than the default 60 minutes)

Having some trouble creating the flow logs in our environment. Once figured out, I will report back on my findings. Thank you.

Greetings @DeanJohnsonUK, I'm having issues creating the flow logs so instead, I have been trying other options to no avail. My FW admin has already allowed whitelisting hub.docker.com, management.microsoft.com and mcr.microsoft.com. He still has not seen any traffic going out. One thing I noticed is that while the container instance is being created and it is stalling at the image pull (showing the "Waiting" status until it times out), I see an error message "FailedMountAzureFileVolume" in the definition. I even hardcoded the SAS key for the bicep and it still has problem. I already gave several RBAC including Storage File Data Privileged Contributor to the User Managed Identity as per documentation. Anyway, I will give a few more trials and may have to give up on this Bicep Deployment Scripts until this private endpoint becomes more stable.

DeanJohnsonUK commented 6 months ago

Might be worth checking your private DNS zone is set up correctly so the container can resolve the internal ip address of the storage account for the file share that the container uses

htwashere commented 6 months ago

Might be worth checking your private DNS zone is set up correctly so the container can resolve the internal ip address of the storage account for the file share that the container uses

Dear @DeanJohnsonUK : you are dead right regarding the missing DNS zone lookup. For some reason, this container instance instantiation could not resolve the storage account private IP. We operate in a hub/spoke topology where the private DNS zone is set up in another subscription and we have a VNET link that works for all of our private endpoint resources. But in this case with deployment scripts, for some reason I need to add an additional VNET link. This took me over 3 days to figure out due to the lack of useful debug messages. Hopefully this will help someone else when they get into similar situation...Thanks again for helping me out!

DeanJohnsonUK commented 6 months ago

No problem, glad you got it resolved. :)