Open Probstson opened 4 years ago
With "useAsyncMode: true" it works, but it doesn't in sync mode.
This has been an issue for the longest time, and it has taken me until yesterday to figure out most of what is going on.
Async: If the solution import is quick enough, the task terminates correctly, but if it is long running it just keeps going all the way to the async timeout, even if the solution import has been successful in the meantime.
So we've learned to ignore the task progress in ADO, periodically refresh the solution window in D365/MDA, and when we see it has succeeded there, we go back to ADO, cancel the release and re-run it. It skips over the long-running solution (as it is now in place) and on we go. I don't know how long 'too long' is, but it might be related to my comment below.
Sync: similar but different. If the import quick enough, you are ok. If it is too long, then the task will keep running FOR EVER, no matter what you set the sync timeout to (IIRC Wael calls this the 'CRM Connection timeout' in the UI).
Obviously this is troublesome, so I've looked into alternatives. And I noticed an interesting comment in the new MS Power Platform Import Solution task:
So I think that it might be that the sync task, if it exceeds 4 minutes, will never get a message back because the Azure Firewall (load balancer?) has timed out. The task never knows to stop running, so it goes for ever. The async one I'd be less certain, but it's certainly possible that it is waiting for a callback that never comes?
The 3min50 timeout on the loadbalancer is a pain, I've hit it when running SSRS reports in custom apps. Could fortify it by updating the PollingOrganizationService used by the AsyncOperationsManager to tear down / re-up its CrmServiceClient connection every couple of minutes? Could implement async in the ExportSolution cmdlets with a flag / wrapping the export request in an ExecuteAsyncRequest?
Another approach might be to use https://azure.microsoft.com/en-gb/blog/new-configurable-idle-timeout-for-azure-load-balancer/ ?
Unfortunately you can only modify the ALB timeout on inbound IPs of VMs, whereas this isn't configurable for either the CDS PaaS instances or Azure App Services
Thanks, @MichaelHolmesWP - I was wondering :-) I tasked our developers with looking into it, with the caveat that I couldn't find a way to do it myself.
I'll get them to focus on using a keep-alive instead. I need to check Wael's code though, as I'm sure he is polling for status every 15s, and I can't figure out why that wouldn't keep the connection alive.
I imagine it would have something to do with IOrganizationService's internal handlers, I'm not too sure how it runs under the hood. Wael instantiates an IOrganizationService as his polling service in ImportXrmSolutionCommand, we could sub that out for an IOrganizationServiceFactory instantiation and then just periodically re-up IOrganizationService in the ImportJobHandler/AsyncUpdateHandler in SolutionManager.cs
However I've yet to see an actual async import task failure occur so I don't know if this is even necessary? OP says he only had the issue in Sync mode which is understandable.
Would just be whether or not we want to implement the same Async Wrapper logic on the export solution cmdlet to give that an async mode too.
Safest thing would be to just remove the synchronous option from the ADO Import Solution task, it makes no logical sense as the tasks execute sequentially anyway
@cseymr @MichaelHolmesWP
Very interesting discussion.
Both the sync and async mode have had many checks added to be as resilient as possible to longer imports.
I have used both successfully to import solutions that in some cases took over 4 hours. The reason for 4 hours is I believe there is an auto re-try at the MS so end so if the first attempt fails it may do another re-try depending on the error.
The async mode should always be used. You can increase the async wait timeout. This controls how long should you wait for the import job to complete. The connection timeout is not much relevant here, it may only be relevant on the initial upload of the solution if the connection is slow and solution is large.
The sync mode was introduced, as on a number of cases the async mode would fail while sync mode would succeed. There was due to slight variations at the MS backend so the way both are imports/cache is handled. This issue/difference may or may no longer be there now. But you always have the option to try this is you get something like SQL timeout etc...
The sync mode obviously opens one connection and keeps that connection open until import finishes. so the connection timeout here is very relevant. However there are many proxies and firewalls in the middle which may terminate longer connections. For this reason the task attempts to catch this exception and then it tries to query the import job and wait for it to complete.
Is there if any issue in any then taking a look at the logs in debug mode would help to see where the issue is or where the job was hanging.
Hope this helps. Let me know if there are any changes you think can be done to improve this.
@cseymr : Was the import issue resolved for you ? Can you please let me know what you did to fix it ? I l was reading your post where you were re-running the pipelines after you see the upgrade solution in target. In order to avoid this i did the following: Step 1: Add import task and run it in async mode and in control options you enable it to "continue on error". This will fail the task after the set async time. Step 2: Create another import task(copy of import 1) and in the control options you enable it to run "only when a previous task has failed". This way you don't need to manually run the solutions if the import task fails. This will re-run the task and in the subsequent run it should go faster as the solution is already existing in the target.
Please let me know what fixed your issue. Thank you !
@saiankith-explorer I think we're OK with this now. The async import is behaving properly.
The only issue we have now is that the ADO UI sometimes (often!) doesn't realise that the task has actually completed. The release carries on happily, but the UI still shows the original task as ongoing. If you hit the 'Refresh' button, the UI updates and shows everything correctly. I don't know if this is a problem with how the task is reporting its status to the UI, or just the UI itself not responding correctly. Either way - we can live with it :-)
Hello Wael Hamze,
I have a problem with the Import-Solution Task. I configured a real high timeout (higher than the real used time), but the pipeline task always run in a timeout, despite a successfull deployment in dynamics. Do you have any suggestions?
The task version is the newest (12) Agent Specification: windows-2019
yaml of the task: steps:
Best regards, Christian Probst