[BUG] Extractor is trying to read whole APIM instance when specific config provided

nepsmaddy commented 3 months ago

Release version

v6.0.1-rc1

Describe the bug

When providing specific details to extract for the api, extractor should only fetch the specific data and complete the process.

This was working fine in old release v4.7.x but in new release it is trying read through all the metadata which is causing lot of time to run the extractor instead of seconds.

Consider below is my extractor config looks like as example.

apiNames:

apim-health

backendNames: [ignore]

namedValueNames:

environment
region

productNames: [ignore]

tagNames:

finance-nocharge

diagnosticNames: [ignore] loggerNames: [ignore] policyFragmentNames: [ignore] subscriptionNames: [ignore]

Expected behavior

Now as per the config, it extractor should only scan specific api, namedvalues and tags. rest it should not worry and give the artifacts.

Note: This is exactly same behavior in 4.7.x but not is current release. it scan through everything in current release and taking long time to do it.

Actual behavior

Extractor scan through the configs given in config.yaml inclusive of all other metadata of the apim instance which are not required, showing as warning and skipped the resource in logs. please find below snippet for the same.

warn: extractor.ShouldExtractFactory[0] NamedValueName allegroinvoice-mugf-sappipo-prx-password is not in configuration and will be skipped. warn: extractor.ShouldExtractFactory[0] NamedValueName allegroinvoice-mugf-sappipo-prx-username is not in configuration and will be skipped. warn: extractor.ShouldExtractFactory[0] NamedValueName ami-ccevents-password is not in configuration and will be skipped. warn: extractor.ShouldExtractFactory[0] NamedValueName ami-ccevents-subscription-key is not in configuration and will be skipped. warn: extractor.ShouldExtractFactory[0] NamedValueName ami-ccevents-username is not in configuration and will be skipped. warn: extractor.ShouldExtractFactory[0] NamedValueName ami-optout-client-id is not in configuration and will be skipped. warn: extractor.ShouldExtractFactory[0] NamedValueName ami-optout-password is not in configuration and will be skipped. warn: extractor.ShouldExtractFactory[0] NamedValueName ami-optout-secret is not in configuration and will be skipped. warn: extractor.ShouldExtractFactory[0] NamedValueName ami-optout-username is not in configuration and will be skipped.

Reproduction Steps

Update the configration.extractor.yaml with specific api and related meta data.
In configuration, ignore some configs like provided as example.
run extractor and observe the logs in steps.

github-actions[bot] commented 3 months ago

  Thank you for opening this issue! Please be patient while we will look into it and get back to you as this is an open source project. In the meantime make sure you take a look at the [closed issues](https://github.com/Azure/apiops/issues?q=is%3Aissue+is%3Aclosed) in case your question has already been answered. Don't forget to provide any additional information if needed (e.g. scrubbed logs, detailed feature requests,etc.).
  Whenever it's feasible, please don't hesitate to send a Pull Request (PR) our way. We'd greatly appreciate it, and we'll gladly assess and incorporate your changes.

guythetechie commented 3 months ago

@nepsmaddy - just to be clear: is the extractor creating artifacts for resources that should be skipped? Or are you just noticing references to other resource names in the logs?

nepsmaddy commented 3 months ago

@guythetechie , yes logs creation as well. At first point extractor should not scan anything except the configuration provided. Also i noticed this was working as expected in v4.7.0. But in latest i see its generating all this logs as well.

DSpirit commented 3 months ago

+1, as in our environment this is quite time consuming (300+ API's) I don't know why the extractor needs to loop through all resources when a specific resource set is defined in the extractor config. Even worse is the fact, that the extractor hasn't got a proper subscription read limit handling, so that long running operations often result in a SubscriptionRequestsThrottled error, e.g.:

System.Net.Http.HttpRequestException: HTTP request to URI https://management.azure.com/subscriptions/***/resourceGroups/***/providers/Microsoft.ApiManagement/service/***/apiVersionSets/***?api-version=2023-09-01-preview failed with status code 429. Content is '{"error":{"code":"SubscriptionRequestsThrottled","message":"Number of 'read' requests for subscription actor '***:***' exceeded. Please try again after '1' seconds after additional tokens are available. Refer to https://aka.ms/arm-throttling for additional information."}}'.

Causing the entire extraction to fail occasionally.

guythetechie commented 3 months ago

@nepsmaddy - we've always retrieved all APIs, then filtered by API name in configuration. v4.7.0 behaves the same way. There are two major differences:

We now log which resources were skipped to make it clear that some have not been extracted.
We also retrieve the specification contents before filtering the API. This is probably what's causing performance issues now; prior to v6, performance did not seem to be a problem.

@DSpirit - could you define "quite time consuming"? How long does it take to run on 300+ APIs? Will add to our backlog for fixing, but prioritization will depend on how bad it is.

DSpirit commented 3 months ago

After merging my change from #612 my pipeline retries became unnecessary, so extraction dropped from 25 mins to 3-6 minutes for a single API. Extracting all assets takes about 8-11 minutes. This is acceptable however, just with the missing 429 handling it became really annoying. Sure this could be improved for single API extraction, but for now it's completely fine, since the APIM Resource Kit hasn't been any better :)

Thanks for the quick feedback today, really appreciate it!

einzweirad commented 3 months ago

@guythetechie Nevertheless, this behavior in v6 is a huge problem for larger APIM instances (around 850 APIs)

In our case, the extractor runs for about 8 minutes to loop trough the 1700 NamedValues alone. All the other parts (Tags, Products, Subscriptions and so on) add up to a total time of more than 50 Minutes to extract a single API.

The same extraction for a single API with everything else is set to [ignore] finished in under 1 minute in v5.1.4.

guythetechie commented 3 months ago

Thanks for the feedback, all. Will prioritize addressing this.

nepsmaddy commented 3 months ago

For me as well in v6, my extractor is taking almost 50 mins to complete the run.

guythetechie commented 3 months ago

Fix pushed to main branch, should be deployed in our next release.

Azure / apiops