Azure / ipam

IP Address Management on Azure
https://azure.github.io/ipam
MIT License
283 stars 93 forks source link

vNets in Block becoming unassociated and cant be associated again #329

Open tobvil opened 1 week ago

tobvil commented 1 week ago

Describe the bug We have an issue where some vNets are suddenly getting unassociated to the block. When we try to associate the vNet again, we get the following error: {"error":"Block already contains network(s) and/or reservation(s) within the CIDR range of target network."}.

When we look in CosmosDB we can see that the vNet actually still is in block, but it is ("active": false).

We can fix the issue by manually removing the vNet entry from CosmosDB and then associate the vNet again.

But then a day after, we suddenly have another vNet that's changed to ("active": false) and becomes unassociated.

To Reproduce Steps to reproduce the behavior:

  1. vNet is suddenly unassociated.
  2. vNet can't be added again, because of error {"error":"Block already contains network(s) and/or reservation(s) within the CIDR range of target network."}.
  3. Checking CosmosDB, we can see the vNet is actually still there, but with ("active": false).

Expected behavior We expect vNet to not change from ("active": true) to ("active": false).

Version { "status": "OK", "version": "3.4.0", "stack": "LegacyCompose", "environment": "AZURE_PUBLIC", "container": { "image_id": "debian", "image_version": "12", "image_codename": "bookworm", "image_pretty_name": "Debian GNU/Linux 12 (bookworm)" } }

DCMattyG commented 1 week ago

Hi @tobvil, appreciate you reaching out about this issue you are seeing.

It's a bit strange because this is how the process works which changes a VNET active from true to false:

So, this would seem to indicate that Azure Resource Graph is not returning a complete list of Virtual Networks in your environment.

Would it be possible to work together with you to evaluate this problem further?

Please reach out to me via email at Matthew.Garrett@microsoft.com. Effectively, when you are seeing this issue, we'll want to run an ARG query to see if Resource Graph isn't returning the correct details in your environment.

Does that sound acceptable for next steps?