Azure / AKS

Azure Kubernetes Service
https://azure.github.io/AKS/
1.95k stars 305 forks source link

[Feature] Disable network policy on the existing AKS Cluster (to allow migration to overlay) #3845

Open denniszielke opened 1 year ago

denniszielke commented 1 year ago

GA: September 2024

Is your feature request related to a problem? Please describe. I want to upgrade an existing AKS cluster with calico network policy to overlay but that is not supported with activated network policy. So that means I cannot follow the upgrade path. https://github.com/Azure/AKS/issues/3720

Describe the solution you'd like I want an ARM api that allows to deactivate network policy. Similar to https://github.com/Azure/AKS/issues/3084 but here to ensure the migration.

Describe alternatives you've considered Afaik no alternatives.

This feature has been included in the v20240207 release and can be followed in release tracker

olsenme commented 1 year ago

In progress

VincentS commented 1 year ago

Any update on this ?

chasewilson commented 11 months ago

Any update on this ?

Still in progress here. Aiming at having this out in the coming months.

cderocco5 commented 10 months ago

Is there an estimated date when this will be complete? I have a 500 node AKS cluster and I want to disable Azure NPM to prevent putting pressure on the apiserver

Santhyrama commented 9 months ago

I have many workload running on the cluster and IP seems to exhaust, azure overlay is good option for it, but when can we expect the feature to disable network policy in existing CNI cluster?

kelvin-ko commented 8 months ago

Is there any update on this feature request? We are looking for feature to move our multiple clusters(a few tens actually) to CNI overlay before they are hitting IP exhaustion situation..

Shert commented 8 months ago

I'll also be very interested in this new feature, I think ip exhaustion is a common problem

chasewilson commented 8 months ago

@kelvin-ko @Shert We're aiming for the end of this month but could slip to next depending on a few factors out of our control.

arsnyder16 commented 8 months ago

@chasewilson How does this feature relate to enabling a network policy on an existing cluster? We currently have no network policy on a cluster but want to enable calico.

robogatikov commented 8 months ago

@chasewilson How does this feature relate to enabling a network policy on an existing cluster? We currently have no network policy on a cluster but want to enable calico.

@arsnyder16 , yes, after this feature is complete, enabling network policy Calico on an existing cluster will be allowed.

brianereynolds commented 7 months ago

Hi @chasewilson , will this feature allow me to change the network policy from calico to Azure Network Policy Manager ? I want to switch my existing clusters to use long-term support, but I can't (currently) do this as they are configured to use Calico.

jblaaa-codes commented 7 months ago

Is there an update to the release schedule?

robogatikov commented 7 months ago

Hi @chasewilson , will this feature allow me to change the network policy from calico to Azure Network Policy Manager ? I want to switch my existing clusters to use long-term support, but I can't (currently) do this as they are configured to use Calico.

Once this feature is rolled out, you will be able to do it in 2 steps:

  1. Uninstall Calico Network Policy Manager (az aks update -n -g --network-policy none)
  2. Install Azure Network Policy Manager (az aks update -n -g --network-policy azure)
robogatikov commented 7 months ago

Is there an update to the release schedule?

What @chasewilson said in his comment still holds true (end of January - beginning of March)

CrunchyBlue commented 7 months ago

Do we have a release schedule when this might make it to Azure GovCloud?

zensonic commented 7 months ago

So I tried this in west europe on a cluster and got the following. Am I misreading that this should be part of v20240207?

image

Fair enough

image

But still

image

PixelRobots commented 7 months ago

I don't believe this feature has rolled out currently.

Keep an eye here for the announcement.

zensonic commented 7 months ago

Ohh, my mistake. It just said

image

In the top of this, so I assumed. I will wait and have subscriped

fgarcia-cnb commented 7 months ago

i just tested updating network policy "in-place" on a cluster in west central US and it worked great! the release tracker was just updated today, so maybe it was just released. same command @zensonic ran.

it didnt work in westus2, which makes sense since the "currently in operation" column in the release tracker shows it running an old version (as in @zensonic's west europe case)

PixelRobots commented 7 months ago

Yeah it looks like it is still rolling out. So once all regions are updated it should be fully out.

amitmavgupta commented 7 months ago

It looks good, tested an upgrade from

rgarcia89 commented 7 months ago

I tested the following for a kubenet with calico for policies

az aks update -g -n --network-policy none
az aks update -g -n --network-plugin azure --network-plugin-mode overlay

so far everything worked - however when I tried to re-enable calico for network policies using

az aks update -g -n --network-policy calico 

the cluster and ended up in a fail state complaining about:

plugin type="calico" failed (add): no podCidr for node
chasewilson commented 7 months ago

I tested the following for a kubenet with calico for policies

az aks update -g -n --network-policy none
az aks update -g -n --network-plugin azure --network-plugin-mode overlay

so far everything worked - however when I tried to re-enable calico for network policies using

az aks update -g -n --network-policy calico 

the cluster and ended up in a fail state complaining about:

plugin type="calico" failed (add): no podCidr for node

@wedaly

wedaly commented 7 months ago

hi all, we're still in the process of enabling this feature and will update this thread when it's ready.

cderocco5 commented 7 months ago

Is only the adding of a network policy still in the process of testing? Are we able to remove a network policy now with az aks update -g -n --network-policy none

zensonic commented 7 months ago

It still does not work for me :(

[image: image.png]

[image: image.png]

How do I progress from here?

On Mon, Feb 26, 2024 at 11:22 PM cderocco5 @.***> wrote:

Is only the adding a network policy still in the process testing? Are we able to remove a network policy now with az aks update -g -n --network-policy none

— Reply to this email directly, view it on GitHub https://github.com/Azure/AKS/issues/3845#issuecomment-1965412687, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAW4QS45HJLBC63IVWXNMLDYVUDL5AVCNFSM6AAAAAA3KJAIW6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNRVGQYTENRYG4 . You are receiving this because you were mentioned.Message ID: @.***>

fgarcia-cnb commented 7 months ago

still doesnt work for me in westus2 either, even though the release tracker says its complete. westcentralus does work

fgarcia-cnb commented 7 months ago

just started working in westus2

zensonic commented 6 months ago

I just upgraded the cluster to 1.27.9 - still no luck in westeurope... Still on track with early march?

chasewilson commented 6 months ago

@zensonic, thanks for checking in. Yes we still are on track, there was a second toggle rollout required and it's in the process of rolling out. It doesn't have the visibility of release tracker unfortunately but it should be out soon.

terraboops commented 6 months ago

Thanks for your transparency and work on this!

The release tracker shows this as updated everywhere now; is that correct? If not, what's the best way to confirm the release of this? :) https://releases.aks.azure.com/webpage/index.html#tabus

wedaly commented 6 months ago

hi all, thanks for your patience waiting for this feature! The code and configuration changes have now reached every region. We're doing a final review of the documentation and will publish that soon as well.

PixelRobots commented 6 months ago

Whilst we wait for the official docs, check out my blog post. https://pixelrobots.co.uk/2024/03/first-look-changing-or-disabling-your-network-policy-provider-on-aks/

rgarcia89 commented 6 months ago

I just migrated a cluster in germany west central from kubelet and calico as policy engine to azure-cni overlay with calico as policy engine :tada:

tsiv-at-nnit-com commented 6 months ago

Just to chip in. It is working nicely for us as well. We are doing this in switching form calico network policy to azure in TF state/running clusters without destroying the clusters

  1. terraform rm $kubernetesobject
  2. az aks update -g $rg -n $aksname --network-policy none
  3. az aks update -g $rg -n $aksname --network-policy azure
  4. terraform import $kubernetesobject $azureaksresourceid
  5. terraform plan # LTS code with new network policy set to azure
  6. terraform apply # LTS code with new network policy set to azure

It takes us around 2.5 hours on the clusters we run. It flips the node agents and networks in the process. It behaves like a couple of patching rounds/aks upgrades

az aks update can be resumed if it times out btw. We experienced a timeout, but it could be mended by a rerun. Thanks to the very nice PG in MS for this feature!

chasewilson commented 6 months ago

Thank you all for your patience and feedback! A huge shout out to @wedaly and @robogatikov for the work they put in to this feature and to get it out!

rgarcia89 commented 6 months ago

@tsiv-at-nnit-com I am using terraform for the deployment of clusters too. For me it worked to just migrate the clusters to azure and afterwards update the definition of it in the main.auto.tfvars file from kubenet to azure. Afterwards using tf apply, the change was detected but since it matches with the defined state, it just reported no changes.

robogatikov commented 6 months ago

The official documentation on uninstalling Network Policy engine (Azure NPM or Calico) is here: https://learn.microsoft.com/en-us/azure/aks/use-network-policies

robogatikov commented 6 months ago

Also don't hesitate to open a support ticket if you run into any issues (like upgrade request timeout for @tsiv-at-nnit-com) so we can troubleshoot.

davem-git commented 6 months ago

This is the network policy right, not the network plugin? is there away to remove the plugin as well?

zensonic commented 6 months ago

This is the network policy right, not the network plugin? is there away to remove the plugin as well?

Is for the policy.. for us to get to azure network policy because of desire for long term support.. we were on calico policy, but ofc MS can not support anything but their own stuff (more or less, world is not black and white) long term..

Network plugin change is a reprovision as of now. Until a moment ago so was network policy change 😊

gp-sharma commented 5 months ago

Hello Everyone,

I have been trying to remove the network policy from aks cluster. For me the command itself throws an error-

$ az aks update --resource-group my_rg_name --name my_cluster_name --network-policy none ERROR: unrecognized arguments: --network-policy none

Examples from AI knowledge base: az aks update --resource-group MyResourceGroup --name MyManagedCluster --load-balancer-managed-outbound-ip-count 2 Update a kubernetes cluster with standard SKU load balancer to use two AKS created IPs for the load balancer outbound connection usage.

az aks update --resource-group MyResourceGroup --name MyManagedCluster --api-server-authorized-ip-ranges 0.0.0.0/32 Restrict apiserver traffic in a kubernetes cluster to agentpool nodes.

az version Show the versions of Azure CLI modules and extensions in JSON format by default or format configured by --output (autogenerated)

https://docs.microsoft.com/en-US/cli/azure/aks#az_aks_update Read more about the command in reference docs

could someone help me understand the issue ?

amitmavgupta commented 5 months ago

@gp-sharma you are disabling this on an AKS cluster that was created with Kubenet/ BYOCNI or some other plugin? Can you add that here?

Assuming that you are ok aks-preview extension version 0.5.166 or higher?

gp-sharma commented 5 months ago

@amitmavgupta we have used "kubenet" network plugin while creating aks cluster.

aks-preview version is 2.0.0b8.

amitmavgupta commented 5 months ago

@gp-sharma I have not had any issues with Kubenet while disabling the policy and have documetned both the scenarios (see below) just in case it helps you.

tnn-simon commented 5 months ago

When will this change be released in a stable API? Looking forward to manage this through my IaC pipelines.

chasewilson commented 5 months ago

When will this change be released in a stable API? Looking forward to manage this through my IaC pipelines.

@tnn-simon we're aiming for a July GA, thanks for the interest!

@amitmavgupta thanks for the help in the comments!

itzhakja commented 1 week ago

Has anyone run into an issue after setting back the network policy to "calico," and the pods within the calico-system namespace have not been created? and also the tigera-operator namespace and its pods have not been created as well

dknippet commented 1 week ago

I use Calico Cloud (paid version) so when I create a cluster I set the policy to none with azure cni overlay. Then I run a script to install Calico cloud. Easy.

If you're using open source Calico... I recommend doing similar. Create the cluster with no network policy, then just install Calico manually. https://docs.tigera.io/calico/latest/getting-started/kubernetes/quickstart

I usually use Bicep to spin up clusters but I was in the UI the other day and noticed you can select azure cni overlay and Calico at the same time. The dashboard changed because you used to have to use to have to create the cluster with Azure cli to enable azure cni overlay. I bet there's a bug in the UI.

On Sun, Sep 15, 2024, 2:52 AM itzhakja @.***> wrote:

Has anyone run into an issue after setting back the network policy to "calico," and the pods within the calico-system namespace have not been created? and also the tigera-operator namespace and its pods have not been created as well

— Reply to this email directly, view it on GitHub https://github.com/Azure/AKS/issues/3845#issuecomment-2351473768, or unsubscribe https://github.com/notifications/unsubscribe-auth/A62YGOWXS5F7YBAA6GUOTRTZWVDEDAVCNFSM6AAAAAA3KJAIW6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNJRGQ3TGNZWHA . You are receiving this because you are subscribed to this thread.Message ID: @.***>