Azure / aztfexport

A tool to bring existing Azure resources under Terraform's management
https://azure.github.io/aztfexport/
Mozilla Public License 2.0
1.58k stars 184 forks source link

Full Azure tenant Export - issue with big environment #554

Closed Daniele-S86 closed 1 month ago

Daniele-S86 commented 1 month ago

I am manging a big enterprise Azure environment. My idea is to perform a full export of the owl Azure structure using this tool, in order to have an export fileset for each RG (the final foal is to obtain a daily "backup/lifesaver" to be read in case of need. The total number of RG to be exported is close to 1000 (splitted among 30 subscriptions).

The command i am usign to export each RG is similar to this one: aztfexport query --use-azure-cli-cred --non-interactive -continue --parallelism 5 --output-dir "c:\sub1\rg1" "resourceGroup =~ 'rgname'"

Here is the problem: if I go "rg-per-rg" it will take some days/week to perform the full extraction, so I built a powershell script with a multi-thread approach and playing also with the "parallelism" option on the aztfaexport command. By the way I did a lot of tests but also working with a 64 vCPU servers, at a certain point the aztfexport start failing with a lot of different errors. I tried with many optimizations, like lower the number of parallel threads to 16, checking if the az session is still valid (cause sometimes it expires), elaborating one subscription per time, etc. But still I have a lot of errors (like at each cycle about 200-300 rg export fail, randomly) and it took more or less 19 hours to complete. Another problem is that, at each export, the provider/executable is downloaded on my user profile (C:\Users\username123\AppData\Local\Temp) , and I have to clean the temp folder to not fill up the OS disk.

Can you suggest me a smarter approach to be followed in order to reach my full export goal?

Thanks in advance for your help, D

magodo commented 1 month ago

Hi @Daniele-S86, thank you for reaching out!

Some advices for you:

  1. Setup the provider caching by following this to avoid repeatedly downloading the provider
  2. For your "multi-thread" approach, I'd suggest to export 30 rgs from those 30 subs in one run. This is to avoid reaching to the ARM throttling too quickly. While with your export ongoing, you are still potentially to hit the throttling and fail the export. If you can share some examples of export failures, that would be very helpful
  3. Each parallelized routine in one export will launch a provider process, which can consume your memory if scaled up. If you only want the HCL instead of the state, you can try a hidden flag named -tfclient-plugin-path=<provider path>, together with the -hcl-only flag. This will only start one provider for one export run, regardless the amount of parallelism you've specified
  4. If you stil want the state file, and also wants to benefit of only one provider per each run. Then you can try the Terraform official import block together with the config generation feature. To get those import blocks, you can specify the -g and --generate-import-block. Note that there are limitations about the Terraform officially generated config.
Daniele-S86 commented 1 month ago

First of all thanks for your complete answer. let me answer point-by-point

1) I tried this, but now I receive a lor of errors like this one:

aztfexport : Error: error running terraform init for the output directory: exit status 1
At line:66 char:13
+             aztfexport query --use-azure-cli-cred --non-interactive - ...
+             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : NotSpecified: (Error: error ru...: exit status 1:String) [], RemoteE 
   xception
    + FullyQualifiedErrorId : NativeCommandError

Error: Failed to install provider

Error while installing hashicorp/azurerm v3.99.0: open
C:\Terraform\plugin_cache\registry.terraform.io\hashicorp\azurerm\3.99.0\windows_amd64\terraform-pr
ovider-azurerm_v3.99.0_x5.exe:
The process cannot access the file because it is being used by another
process.

Seems that having multiple thread using this provider, is a problem. Isn't it?

2) Not sure I got your point. Let me better explain what is my approach. I am following a multithread strategy as described here (https://www.get-blog.com/?p=189): it is based on runspacepools, so what I am doing is lunching 16 threads, and each of them is processing and aztfexport for a single Reaource Group (with --parallelism set to 5); once the current RG is exported, the next one in queue comes to the thread and it's executed. This is done for all the RG of a single subscription, then I switch to the next subs (az account set "subsname") and I do the same. What I observed is that if I set up 32 threads or more, the number of fails grow exponentially, probably due to the fact that I hit some limit. One of the most common errors I get is similar to these, but they are various and seems related to some limit hit as sayd.

aztfexport : Error: listing resource set: executing ARG query "Resources | where resourceGroup =~ 
'rg-*****'  | order by 
id desc": AzureCLICredential: exit status 1
At line:55 char:13
+             aztfexport query --use-azure-cli-cred --non-interactive - ...
+             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : NotSpecified: (Error: listing ...: exit status 1:String) [], RemoteE 
   xception
    + FullyQualifiedErrorId : NativeCommandError

or

At line:55 char:13
+             aztfexport query --use-azure-cli-cred --non-interactive - ...
+             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : NotSpecified: (Error: initiali...rrors occurred::String) [], RemoteE 
   xception
    + FullyQualifiedErrorId : NativeCommandError

    * task error: error running terraform init: exit status 1

Error: Failed to query available provider packages

Could not retrieve the list of available versions for provider
hashicorp/azurerm: could not connect to registry.terraform.io: failed to
request discovery document: Get
"https://registry.terraform.io/.well-known/terraform.json": net/http: request
canceled while waiting for connection (Client.Timeout exceeded while awaiting
headers)

At the beginning of my tests, I also tried to export in a single shot one entire subscription: this is incredibly fast (not very much longer then to export a single RG), but the output is quite "unreadable". Having a file for each RG is much cleaner...

3) unfortunately I need also the State, since some object property/configuration are only "listed" there.

4) I am not very familiar with Terraform, but is this some kind of alternative to aztfexport what you are proposing?

Thanks again for your help.

magodo commented 1 month ago

@Daniele-S86

  1. This seems to be a Windows issue, while I'm almost only working in Linux, so I can't answer it...
  2. The first error message is truncated, the second indicates a call from Terraform core to the Terraform registry failed, which normally indicates a network issue on your side, or server issue on the registry site
  3. If the intent for a State is only for those unlisted properties, then you probably missed the --full-properties flag
  4. No, this is just a feature in the terraform core. If you open the link, you'll see the step by step tutorial about how to use it
Daniele-S86 commented 1 month ago

Ciao @magodo maybe to solve the problem with windows I can try to use the provider_installation block with filesystem_mirror (https://developer.hashicorp.com/terraform/cli/config/config-file#filesystem_mirror): what do you think?

Actually there should not be any network issue from my side, but I think it's all part of this "overload" situation. I am trying to reduce as much as possible all possible source of traffic (like disabling Azure CLI telemetry), but I need to work with multiple threads, since the extraction of a single (and quite standard) RG takes arounf 6-8 minutes, and I cannot process all af them sequentially.

I will try also the --full-propertieson my next run and let you know.

By the way: what do you think about the approach I am following to export all my azure configurations? considering the dimensions of my environment, would you suggest a different strategy? Thx, D

magodo commented 1 month ago

@Daniele-S86 I'll do something similar as you did, with the following slight difference:

Daniele-S86 commented 1 month ago

that's a interesting suggestion. Currently I am processing Subscription-per-subscription since I leverage the az login session. Now I will try, as you wrote, to login directly using aztfexport option (--tenant-id , --client-id, --client-secret/--client-certificate). I will also try with -tfclient-plugin-path=<provider path> I will keep posted. Again many thanks for your support, D

Daniele-S86 commented 1 month ago

Ciao @magodo , I followed your suggestion: 1) I used the authentication directly on the aztfexport, without using the az cli session, and I obtained a 5-10% execution time improvment.

2) I used the -tfclient-plugin-path=<provider path> option: this improved terrifically the performances (compared to use local mirror for provider), now is at least 10 time faster!

3) I parallelize each group of RGs among all my subscriptions as you suggested.

Now the total export time for the entire environemnt passed from 19 hours to about 1.5 hour, and the feeling is that it is even possible to still push on the multithread parallelism. By the way now I reach my goal.

Your help was really crucial, so many thanks again!

magodo commented 1 month ago

@Daniele-S86 Thank you for the confirmation! I'll then close this issue now. Please feel free to reopen/create a new issue if you have anything else.