Azure / azure-functions-powershell-worker

PowerShell language worker for Azure Functions.
MIT License
203 stars 52 forks source link

No way to avoid runspace reuse #242

Open AnatoliB opened 5 years ago

AnatoliB commented 5 years ago

PowerShell runspaces may be reused between function executions. This improves the warm start time, but may also lead to unexpected and undesirable consequences when a function execution encounters leftovers from the previous execution (such as modules imported, etc.). We can recommend function authors avoid any assumptions on runspace reuse as a general good practice. However, this guideline will not always be clearly understood, and sometimes difficult to follow consistently even when understood. Consider providing a user-controllable setting that will disable runspace reuse for a given function or a function app. Consider also disabling runspace reuse by default.

oising commented 4 years ago

There are some internals in powershell that are used by powershell workflow that may be of use to keep a pool of runspaces warm, but it has an issue currently: https://github.com/PowerShell/PowerShell/issues/11318

Additionally, while it will reset variables, eventing, debugger and the tx manager, I'm not certain it will unload modules nor clear out function definitions. Worth a look though.

nerddtvg commented 4 years ago

Hello,

I have a situation where I think the runspace reuse is causing us problems. We have functions that perform administrative work against Exchange Online, Skype for Business Online, and other remote PowerShell sessions. We have found that once a worker has connected to Exchange Online then attempts to connect to Skype for Business online, the SfB session fails with a 500 Internal Server Error from the remote endpoint (not from Azure Functions).

These connections are established using plain New-PSSession commands and without loading either the ExchangeOnline or SkypeOnlineConnector modules (as they are not supported).

The PSSessions are not imported in any way. We utilize Invoke-Command -Session to run any commands and return values. All sessions are gracefully closed prior to function completion and inside any error handling if they occur. The functions do not use the same variable names for the sessions either, so there isn't a conflict even if the garbage collection wasn't applicable.

There just seems to be something in handling the Exchange Online session that kills any attempt to connect to Skype for Business Online afterwards.

If we restart the Function App, it works. If we manage to get a worker and runspace that had not previously connected to Exchange, it works. But the moment we hit a runspace that has connected to Exchange, it fails.

We have restricted our workers to only 1 runspace and a maximum of 5 workers. It's easy for us to reproduce the error because of this.

I think it would be nice to have the ability to forcefully refresh the runspace or require it to be restarted when a function execution has completed.

AnatoliB commented 4 years ago

@nerddtvg Do you think you could reproduce this from a regular PS6 or PS7 console on your machine? It would be interesting to see if creating a new runspace actually resolves the issue. Chances are that there is a problem that persists on the process level, so avoiding runspace reuse would not be enough.

nerddtvg commented 4 years ago

@AnatoliB

So I tried doing some basic emulation where I would open those sessions simultaneously and in sequence without it being impacted.

However I haven't been able to try with the Azure Functions Host. I aim to test that next but we rely heavily on a managed identity, so I have to work around that before it will work.

I'll update you soon, I hope. Give me a couple days.

nerddtvg commented 4 years ago

@AnatoliB

I was unable to recreate it using a local function host. I was using the same version of the function runtime for both. The only changes I made to the code was to use hard-coded access tokens for some secrets.

nerddtvg commented 4 years ago

@AnatoliB

I am still having this issue with New-PSSession failing out and it is specifically only when a Function runspace has connected to Exchange Online. I have tried many different ways to emulate this on my local machine, but I just can't recreate the problem. Yet I have 3 different function apps that show this issue.

Do you have any thoughts on a way I can debug this to identify what part of the runspace reuse is causing a problem? Or something that would show that it's a different issue?

AnatoliB commented 4 years ago

@nerddtvg I would suggest investigating this from the other end:

the SfB session fails with a 500 Internal Server Error from the remote endpoint

The web server should not be directly affected reusing PS sessions or processes on the client. Apparently, a certain API call with a certain payload causes the 500 error on the server side. If we can find what exactly is special about this call, it will give us a better idea on what should not be reused.

As a workaround, do you think you can split your Function app into two apps: one of them uses EO but not SfB, and another one - SfB but not EO, and perhaps have one of them invoke another one via HTTP or queues? I understand this is cumbersome, but this will avoid both runspace reuse and process reuse.

nerddtvg commented 4 years ago

@AnatoliB

I realize the runspace reuse shouldn't have any bearing on the remote server, but clearly there is something bad in the request itself to generate that message.

Unfortunately I can't really break this app up like that. It would require a bit of work around the infrastructure supporting the function app and changes to the pipelines. For now, I've disabled the SfB requests as they are less critical.

If you have any thoughts or suggestions on how I can debugging the sessions establishment requests, I am fully willing to give it a try. We have a full dev environment to try things in.

AnatoliB commented 4 years ago

@nerddtvg What I'm suggesting is to investigate it from the SfB end. Where is it hosted? Do you have access to the logs? If it is hosted by Microsoft, can you submit a support request? The 500 error normally indicates a server-side defect (if the request was wrong, it should have responded with 4xx), so the service engineers may want to take a look and either fix this or at least tell us what is special about the failed request. I'm a bit skeptical about investigating this from the client side: you may end up finding a request that looks good enough.

If you could reproduce this locally, I would suggest capturing the HTTP traffic with Fiddler and comparing successful and failed requests. You can't do it on Azure, unfortunately.

What exactly reports the 500 error? Is it coming from the code running within the PS session (can you provide a sample?), or from the New-PSSession call itself?

nerddtvg commented 4 years ago

@AnatoliB

Sorry, I should have been more specific. This is SfB Online, so it's fully hosted by Microsoft.

I wish I could reproduce it locally but when I run the functions in a local context, I do not get the same issues. I'd be happy to try again if you have any suggestions on how I can better emulate an Azure environment locally instead of simply running 'func host start'.

The code generating that is just a New-PSSession line with appropriate credentials. We never establish the session, it fails here.

This is the sanitized code I can share:

$PowershellURI = "https://admin3a.online.lync.com/OcsPowershellOAuth"
$TenantId = "[tenant id removed]"

$UserName = "adminuser@domain.onmicrosoft.com"
$Password = "[password removed]"

$Body = "grant_type=password&username=" + [System.Web.HttpUtility]::UrlEncode($UserName) + "&password=" + [System.Web.HttpUtility]::UrlEncode($Password) + "&client_id=d924a533-3729-4708-b3e8-1d2445af35e3&resource=" + [System.Web.HttpUtility]::UrlEncode($PowerShellURI)
$AccessToken = (Invoke-RestMethod -Uri "https://login.microsoftonline.com/$($TenantId)/oauth2/token" -Method POST -ContentType 'application/x-www-form-urlencoded' -Body $Body).access_token

$ConnectionURI = "$($PowerShellURI)?AdminDomain=[domain_removed].onmicrosoft.com"

$OAuthCred = New-Object System.Management.Automation.PSCredential('oauth', ($AccessToken | ConvertTo-SecureString -Force -AsPlainText))

# Add an option to remove the machine profile
$SessionOption = New-PSSessionOption -NoMachineProfile

# Forcefully stop anything and throw an error if this fails to force the calling script to deal with it
$SkypeSession = New-PSSession -Name "SkypeSession" -ConnectionUri $ConnectionURI -Credential $OAuthCred -Authentication Basic -SessionOption $SessionOption -ErrorAction Stop

We actually use the same autodiscover method that SfB's Powershell module uses to establish the proper URI and have a Key Vault for the admin user and password, but I pulled all of that out for this purpose.

This is the same method we use to connect to Exchange online with a different endpoint (obviously) and basic authentication (for now).

# ---- Connect to Exchange Online ----
$Credential = New-Object System.Management.Automation.PSCredential($UserName, $Password)

# Add an option to remove the machine profile
$SessionOption = New-PSSessionOption -NoMachineProfile

# We do this via reference variable to ensure we return scoped correctly
$O365Session = New-PSSession -ConfigurationName Microsoft.Exchange -ConnectionUri https://outlook.office365.com/powershell-liveid -Credential $Credential -Authentication Basic -AllowRedirection -SessionOption $SessionOption
AnatoliB commented 4 years ago

@nerddtvg Thank you. If you get this error from New-PSSession, it may be related to PS remoting, not SfB. Could you please also post the entire error message?

nerddtvg commented 4 years ago

@AnatoliB

Unfortunately, the full error is not much better. The error is from remoting establishing the HTTPS session, but it returns the 500 error from the remote server and nothing PSRemoting specific. The HTTP error is just a generic IIS error message from the remote SfB WinRM server.

I have censored our internal module name and function names since they were identifiable.

The URL used in this connection was: https://admin3a.online.lync.com/OcsPowershellOAuth?AdminDomain=[domain]

EXCEPTION: Failed connecting to Skype Online: System.Management.Automation.Remoting.PSRemotingTransportException: Connecting to remote server admin3a.online.lync.com failed with the following error message : <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"/>
<title>500 - Internal server error.</title>
<style type="text/css">
<!--
body{margin:0;font-size:.7em;font-family:Verdana, Arial, Helvetica, sans-serif;background:#EEEEEE;}
fieldset{padding:0 15px 10px 15px;} 
h1{font-size:2.4em;margin:0;color:#FFF;}
h2{font-size:1.7em;margin:0;color:#CC0000;} 
h3{font-size:1.2em;margin:10px 0 0 0;color:#000000;} 
#header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF;
background-color:#555555;}
#content{margin:0 0 0 2%;position:relative;}
.content-container{background:#FFF;width:96%;margin-top:8px;padding:10px;position:relative;}
-->
</style>
</head>
<body>
<div id="header"><h1>Server Error</h1></div>
<div id="content">
 <div class="content-container"><fieldset>
  <h2>500 - Internal server error.</h2>
  <h3>There is a problem with the resource you are looking for, and it cannot be displayed.</h3>
 </fieldset></div>
</div>
</body>
</html>
 For more information, see the about_Remote_Troubleshooting Help topic.
At D:\home\site\wwwroot\Modules\[ModuleName]\Export\[FunctionName].ps1:76 char:9
+         throw "Failed connecting to Skype Online: $($_.Exception)"
+         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo          : OperationStopped: (Failed connecting t\u2026hooting Help topic.:String) [], RuntimeException
+ FullyQualifiedErrorId : Failed connecting to Skype Online: System.Management.Automation.Remoting.PSRemotingTransportException: Connecting to remote server admin3a.online.lync.com failed with the following error message : <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"/>
<title>500 - Internal server error.</title>
<style type="text/css">
<!--
body{margin:0;font-size:.7em;font-family:Verdana, Arial, Helvetica, sans-serif;background:#EEEEEE;}
fieldset{padding:0 15px 10px 15px;} 
h1{font-size:2.4em;margin:0;color:#FFF;}
h2{font-size:1.7em;margin:0;color:#CC0000;} 
h3{font-size:1.2em;margin:10px 0 0 0;color:#000000;} 
#header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF;
background-color:#555555;}
#content{margin:0 0 0 2%;position:relative;}
.content-container{background:#FFF;width:96%;margin-top:8px;padding:10px;position:relative;}
-->
</style>
</head>
<body>
<div id="header"><h1>Server Error</h1></div>
<div id="content">
 <div class="content-container"><fieldset>
  <h2>500 - Internal server error.</h2>
  <h3>There is a problem with the resource you are looking for, and it cannot be displayed.</h3>
 </fieldset></div>
</div>
</body>
</html>
 For more information, see the about_Remote_Troubleshooting Help topic.

Script stack trace:
   at [FunctionName], D:\home\site\wwwroot\Modules\[ModuleName]\Export\[FunctionName].ps1: line 76
   at <ScriptBlock>, D:\home\site\wwwroot\[TimerFunctionTrigger]\run.ps1: line 21

System.Management.Automation.RuntimeException: Failed connecting to Skype Online: System.Management.Automation.Remoting.PSRemotingTransportException: Connecting to remote server admin3a.online.lync.com failed with the following error message : <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"/>
<title>500 - Internal server error.</title>
<style type="text/css">
<!--
body{margin:0;font-size:.7em;font-family:Verdana, Arial, Helvetica, sans-serif;background:#EEEEEE;}
fieldset{padding:0 15px 10px 15px;} 
h1{font-size:2.4em;margin:0;color:#FFF;}
h2{font-size:1.7em;margin:0;color:#CC0000;} 
h3{font-size:1.2em;margin:10px 0 0 0;color:#000000;} 
#header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF;
background-color:#555555;}
#content{margin:0 0 0 2%;position:relative;}
.content-container{background:#FFF;width:96%;margin-top:8px;padding:10px;position:relative;}
-->
</style>
</head>
<body>
<div id="header"><h1>Server Error</h1></div>
<div id="content">
 <div class="content-container"><fieldset>
  <h2>500 - Internal server error.</h2>
  <h3>There is a problem with the resource you are looking for, and it cannot be displayed.</h3>
 </fieldset></div>
</div>
</body>
</html>
 For more information, see the about_Remote_Troubleshooting Help topic.

Since the stack trace was hidden by my capture, I have included this which is the original stack trace from PSRemoting:

2020-06-05T18:41:53.585 [Error] Executed 'Functions.[FunctionName]' (Failed, Id=bb0413a5-49d4-417e-add1-64ee1bf94c82)
Result: Failure
Exception: [admin3a.online.lync.com] Connecting to remote server admin3a.online.lync.com failed with the following error message : ??| For more information, see the about_Remote_Troubleshooting Help topic.
Stack:    at System.Management.Automation.Runspaces.PipelineBase.Invoke(IEnumerable input)
   at System.Management.Automation.PowerShell.Worker.ConstructPipelineAndDoWork(Runspace rs, Boolean performSyncInvoke)
   at System.Management.Automation.PowerShell.Worker.CreateRunspaceIfNeededAndDoWork(Runspace rsToUse, Boolean isSync)
   at System.Management.Automation.PowerShell.CoreInvokeHelper[TInput,TOutput](PSDataCollection`1 input, PSDataCollection`1 output, PSInvocationSettings settings)
   at System.Management.Automation.PowerShell.CoreInvoke[TInput,TOutput](PSDataCollection`1 input, PSDataCollection`1 output, PSInvocationSettings settings)
   at System.Management.Automation.PowerShell.CoreInvoke[TOutput](IEnumerable input, PSDataCollection`1 output, PSInvocationSettings settings)
   at System.Management.Automation.PowerShell.Invoke[T](IEnumerable input, IList`1 output, PSInvocationSettings settings)
   at System.Management.Automation.PowerShell.Invoke[T]()
   at Microsoft.Azure.Functions.PowerShellWorker.PowerShell.PowerShellExtensions.InvokeAndClearCommands[T](PowerShell pwsh) in C:\projects\azure-functions-powershell-worker\src\PowerShell\PowerShellExtensions.cs:line 45
   at Microsoft.Azure.Functions.PowerShellWorker.PowerShell.PowerShellManager.InvokeFunction(AzFunctionInfo functionInfo, Hashtable triggerMetadata, TraceContext traceContext, IList`1 inputData, FunctionInvocationPerformanceStopwatch stopwatch) in C:\projects\azure-functions-powershell-worker\src\PowerShell\PowerShellManager.cs:line 280
   at Microsoft.Azure.Functions.PowerShellWorker.RequestProcessor.InvokeSingleActivityFunction(PowerShellManager psManager, AzFunctionInfo functionInfo, InvocationRequest invocationRequest, FunctionInvocationPerformanceStopwatch stopwatch) in C:\projects\azure-functions-powershell-worker\src\RequestProcessor.cs:line 447
   at Microsoft.Azure.Functions.PowerShellWorker.RequestProcessor.ProcessInvocationRequestImpl(StreamingMessage request, AzFunctionInfo functionInfo, PowerShellManager psManager, FunctionInvocationPerformanceStopwatch stopwatch) in C:\projects\azure-functions-powershell-worker\src\RequestProcessor.cs:line 301
AnatoliB commented 4 years ago

@nerddtvg This may or may not help you, but you can try and switch your app to PowerShell 7 by using these instructions. Chances are, the problem is resolved in PowerShell 7.

@SteveL-MSFT Apparently, the user is experiencing intermittent PS remoting issues when accessing both Exchange Online and Skype for Business Online, potentially related to reusing either PS sessions or processes. Do you have any suggestions on troubleshooting this?

nerddtvg commented 4 years ago

@AnatoliB

I actually have been running it in V7 for a bit after following that thread. Today I refactored the function to run inside a dedicated [PowerShell] object (hopefully a different runspace) and it still had the same problems after connecting to Exchange (which was not refactored).

That makes me wonder that it is not a runspace reuse issue and maybe related to the connection pooling (even though it's a new request to a new domain).

2020-06-12T22:11:33.614 [Debug] Added WorkerConfig for language: powershell
2020-06-12T22:11:33.615 [Debug] Worker path for language worker powershell: D:\Program Files (x86)\SiteExtensions\Functions\3.0.13760\workers\powershell
2020-06-12T22:11:33.763 [Information] Initializing Warmup Extension.
2020-06-12T22:11:33.837 [Information] Initializing Host. OperationId: 'e389b101-d731-4d8b-b18d-d4187eee2e10'.

2020-06-12T22:11:33.887 [Information] Starting JobHost
2020-06-12T22:11:33.889 [Information] Starting Host (HostId=[removed], InstanceId=5078b9d1-f007-4966-933d-2a5b14b1807c, Version=3.0.13760.0, ProcessId=4908, AppDomainId=1, InDebugMode=True, InDiagnosticMode=False, FunctionsExtensionVersion=~3)
Francisco-Gamino commented 3 years ago

Reusing runspace is by design. We do not have plans to fix this. If you still experience issues by reusing runspace, please let us know.

Francisco-Gamino commented 3 years ago

@stefanushinardi -- could you please update our docs with this information? Thanks.

mcdonamw commented 2 years ago

I just ran across this issue searching for an explanation of an issue I just experienced with a runspace. This is the first time I've used them and I'm using it with pre-built code from PoshGUI.com.

I have a form, where a button calls a Microsoft.Graph cmdlet requiring the Microsoft.Graph.Identity.SignIns module. This cmdlet is wrapped up in an Async { } code block (per PoshGUI's implementation), and all seems to work fine.

However I added code to handle module checks and install/import the modules, if not loaded already, when the script runs. Adding this code to my script seems to break it. First run works fine, but subsequent runs fail with the Async code hanging and never returning. I have to open new PS consoles to re-run the script successfully again. But after each run I have to repeat closing/reopening a new session.

I've since found if remove the import code from the script, it works again as expected, however once I close the script, if I simply manually import that module in the existing PSSession, the script breaks again until I unload the module, after which it returns to working.

In short: forcibly loading a needed module within my script (or the session launching the script), it breaks the code running in the runspace that requires that module.

I'm posting here because the initial post (and at least one reply) here mentions something about issues with runspaces reuse and module importing. Can someone detail out what was meant by that so I can identify if that's what I'm running up against?

oising commented 2 years ago

I just ran across this issue searching for an explanation of an issue I just experienced with a runspace. This is the first time I've used them and I'm using it with pre-built code from PoshGUI.com.

I have a form, where a button calls a Microsoft.Graph cmdlet requiring the Microsoft.Graph.Identity.SignIns module. This cmdlet is wrapped up in an Async { } code block (per PoshGUI's implementation), and all seems to work fine.

However I added code to handle module checks and install/import the modules, if not loaded already, when the script runs. Adding this code to my script seems to break it. First run works fine, but subsequent runs fail with the Async code hanging and never returning. I have to open new PS consoles to re-run the script successfully again. But after each run I have to repeat closing/reopening a new session.

I've since found if remove the import code from the script, it works again as expected, however once I close the script, if I simply manually import that module in the existing PSSession, the script breaks again until I unload the module, after which it returns to working.

In short: forcibly loading a needed module within my script (or the session launching the script), it breaks the code running in the runspace that requires that module.

I'm posting here because the initial post (and at least one reply) here mentions something about issues with runspaces reuse and module importing. Can someone detail out what was meant by that so I can identify if that's what I'm running up against?

This seems more like a pure powershell question -- you should join one of the discord servers ( https://discord.gg/powershell ) and ask there. This repo is azure function specific.