dmitryserbin / azdev-release-orchestrator

Azure DevOps extension to manage and orchestrate release pipelines
MIT License
23 stars 12 forks source link

Unable to get release status #60

Closed foliv57 closed 3 years ago

foliv57 commented 3 years ago

Hi,

I have an issue with the orchestrator tasks. Randomly, the execution failed with the error "Unable to get release status". It append approximatively for 50% of our executions. Especially for long running tasks (but not only).

I checked the source code and, based on debug logs, it seems that the method "releaseApi.getRelease" in function "getReleaseStatus" of "releasehelper.ts" don't fail but return an empty "Release" object. Like if the API response is a 200 OK but the response can't be parsed into a "Release" object. It seems to be a 200 OK because there is no debug logs coming from the "Retryable".

Orchestrator version : 2.0.846 Task version: 2.0 Mode: Deploy latest release with stage and branch filter

Thank you tasklog_10.log

dmitryserbin commented 3 years ago

Hi, thanks for reporting this.

I will have a look at the issue within the next few days. Meanwhile, you can try playing with 'updateInterval' parameter, to reduce frequency of the release status calls.

updateInterval: 5 # Required. Default: 5 (seconds)

dmitryserbin commented 3 years ago

Just a quick update. It seems like some sort of connectivity issue to me - some of the status calls get rejected, or to be more specific, return empty response. Since you're using local Azure DevOps instance, this might be something to do with your network performance and/or server networking/configuration.

To start with, I'm looking into updating underlying Azure DevOps API library version with this fix to the REST client. Hoping this will improve the REST API calls retry mechanism.

Will keep you updated here.

foliv57 commented 3 years ago

Thank you for the update.

I agree with your conclusion. Unfortunately, it's complicated for me to get network logs because I don't have the rights for that.

I checked the related change that you linked. Maybe I'm wrong but, can I recommend to have a debug log of the exception in this HttpClient ? Like this it can help to identify if the call ends with an error or if it succeeds with an incorrect content. I agree that the second case looks weird but imagine that, for some reason, the request goes through a proxy which return a 200 OK with an unexpected content to hide the underling error.

dmitryserbin commented 3 years ago

Hi @foliv57 please re-test and advise if you still experience the issue.

foliv57 commented 3 years ago

Ok. We will test it next week. I keep you posted

foliv57 commented 3 years ago

Hi, Unfortunately the issue still append. Now we have this debug log when it failed:

2021-02-01T14:47:26.4408415Z 2021-02-01T14:47:26.438Z release-orchestrator:CommonHelper:wait Waiting <15000> milliseconds
2021-02-01T14:47:41.4408597Z 2021-02-01T14:47:41.439Z release-orchestrator:Deployer:deployManual Updating <TEST> (72643) stage <InProgress> status
2021-02-01T14:47:41.4410523Z 2021-02-01T14:47:41.439Z release-orchestrator:Retry:retryable Executing <getRelease> with <10> retries
2021-02-01T14:48:11.5189302Z 2021-02-01T14:48:11.518Z release-orchestrator:TaskHelper:fail Task <Failed> result (ignore failure <false>)
2021-02-01T14:48:11.5280912Z ##[error]Unable to get <17603> release status
2021-02-01T14:48:11.5326654Z ##[section]Finishing: Deploy latest Gateways

The error is still not trapped by the "retryable".

foliv57 commented 3 years ago

I don't know if it can give you a clue or put you in a wrong way, but our target releases have post deployment gates.

Unfortunately I cant certify that the problem append when the targeted release is in post deployment gate validation.

We are trying if the issue append if we disable the gates.

foliv57 commented 3 years ago

Please ignore my latest comment. It fail also without post deployment gates.

foliv57 commented 3 years ago

In "common\retry.ts" line 65, as the function is async. The retryAsync should not be apply with an await ?

// @ts-ignore
return await retryAsync.apply(this, [target, args, attempts, timeout]);
dmitryserbin commented 3 years ago

Thanks for testing it again @foliv57, I was expecting the same result.

It seems to me that for some reason your Azure DevOps server returns an empty release status object. This explains why retryAsync doesn't catch anything - there is nothing to catch, it's an empty 200 response.

I will see if I can add a retry mechanism for this scenario.

Meanwhile, please try increasing updateInterval in the task settings to something like 30 or more (seconds) - this should help as well.

dmitryserbin commented 3 years ago

Hi @foliv57,

Please see the linked PR. Unfortunately, I cannot reproduce the issue in my environment, so we'll have to rely on testing from your end. I will need you to test updated task to see if my changes help with the issue.

I will produce a preview VSIX extension artifact, you'll need to manually download it and install in your Azure DevOps server to test.

foliv57 commented 3 years ago

Hi @dmitryserbin, Thanks for the update and for the effort. Sounds good, let me know when the preview VSIX is ready.

dmitryserbin commented 3 years ago

Hi @foliv57, please try version 2.0.855 (dmitryserbin.release-orchestrator-2.0.855.zip) attached.

foliv57 commented 3 years ago

Hi @dmitryserbin, I have good news:

2021-02-05T10:27:56.759Z release-orchestrator:Retry:retryable Executing <getRelease> with <10> retries
2021-02-05T10:27:56.818Z release-orchestrator:Retry:retryAsync Retrying <getRelease> (empty) in <6> seconds
2021-02-05T10:28:02.891Z release-orchestrator:ReleaseHelper:getReleaseStatus Release <Release-7> status <Active> retrieved
2021-02-05T10:28:02.893Z release-orchestrator:ReleaseHelper:getStageStatus Stage <TEST> status <InProgress> retrieved

It works as expected. Successful retry after first empty result. The fact that the API return nothing will stay a mystery, but this update clearly add robustness to the orchestrator.

Thank you very much for this upgrade

dmitryserbin commented 3 years ago

Nice! Hey, could you please run it for a few days and report back? I'm wondering if any other API calls besides getRelease return an empty object.

foliv57 commented 3 years ago

Yes, sure. For now it’s a 100% pass on 4 attempts. I changed back the update interval to 5 seconds to stress it. I keep you posted.

dmitryserbin commented 3 years ago

Hi @foliv57 just wondering if you have any update on this? Cheers

foliv57 commented 3 years ago

Hi @dmitryserbin, It’s a 100% pass with the fix. Having the empty=true only for getRelease looks to be enough.

dmitryserbin commented 3 years ago

Sweet! It will be release shortly, you'll need to remove manually installed extension in your org and re-install it from the marketplace.

dmitryserbin commented 3 years ago

FYI @foliv57 released under v2.0.860

foliv57 commented 3 years ago

Perfect. 2.0.860 installed. Thanks again for this fix