Open Yannik opened 2 weeks ago
Unfortunately there is not much we can do here, the process to compile the code uses csc.exe
(called by the C# compiler methods) and the error you see here is from csc.exe
itself and not any code we control. The typical reason why you would see this error is an AV or other scanning tool is either deleting or in your case holding an exclusive lock on the file. As we don't control how csc.exe
work we have little sway over the outcome here.
We do provide a way to change the temporary directory used here through the remote_tmp option on the shell plugin. This could potentially be changed to a location that is either trusted by the AV or maybe less likely for it to be scanned and locked during the run.
We do provide a way to change the temporary directory used here through the remote_tmp option on the shell plugin. This could potentially be changed to a location that is either trusted by the AV or maybe less likely for it to be scanned and locked during the run.
As far as I can see, this directory could simply be used by an attacker as well, creating an attack vector? (Unless the code is signed - which I'm sure it isn't.. That said - signing of the temporary code done by the ansible controller DOES sound like an interesting idea!)
Anyway - wouldn't a retry/backoff mechanism pretty much solve this problem? Since this is only occuring every couple thousand task executions, it seems very much like unlucky timing.
As far as I can see, this directory could simply be used by an attacker as well, creating an attack vector?
It's certainly not idea but potentially just changing it to another var and not the default $env:TEMP
might be enough to stop the AV from picking it up.
That said - signing of the temporary code done by the ansible controller DOES sound like an interesting idea!)
It's certainly something we are looking into potentially but there are a lot of questions it brings up which make it hard to achieve.
Anyway - wouldn't a retry/backoff mechanism pretty much solve this problem? Since this is only occuring every couple thousand task executions, it seems very much like unlucky timing.
Not necessarily, in some cases maybe but in others it could just fail everytime. In other cases there could be code out of our control that uses Add-Type
and not our custom Add-CSharpType
. I prefer not to add a retry mechanism for such a scenario but I could be convinced otherwise.
One area I want to also look into for the next Ansible version if I have time is to officially support PowerShell 7.x. This version uses a different compiler mechanism that doesn't require temporary files as the compilation happens in process. This could be the solution to this particular problem. I cannot guarantee that it'll be done in the next release though, just something that's on my mind.
I am experimenting with remote_tmp
now, but I suspect that the AV simply has a look at all new files, no matter which directory they are in.
Seeing that async_dir is set to %USERPROFILE%\.ansible_async
, I configured remote_tmp
to %USERPROFILE%\.ansible_tmp
, kinda expecting the directory to be hidden, which is actually not the case, since windows does not recognize dot-prefixed items to be hidden but requires the hidden attribute. Any reason for still using the dot-prefix on async_dir
? Or are you additionally setting the hidden attr on that one?
The remote_tmp
dir is actually not even getting deleted after task/playbook execution, is that on purpose?
I have not rolled this out to prod just yet, so I cannot report any results on the effectiveness of fixing the errors.
One area I want to also look into for the next Ansible version if I have time is to officially support PowerShell 7.x. This version uses a different compiler mechanism that doesn't require temporary files as the compilation happens in process. This could be the solution to this particular problem. I cannot guarantee that it'll be done in the next release though, just something that's on my mind.
Sounds interesting to have that option! (Even though I don't see us rolling out powershell 7.x to all servers in the near future)
Any reason for still using the dot-prefix on async_dir?
It's to replicate the same behaviour on the Linux side where the dir is ~/.ansible_async
and .
means hidden there. We are not explicitly setting the hidden attribute.
The remote_tmp dir is actually not even getting deleted after task/playbook execution, is that on purpose?
The actual dir isn't, the value is meant to be a location where each module would create their own temp directory inside it. The default is %TEMP%
which means when a temp directory is needed it will be created inside that dir and that will be the one that should be cleaned up.
SUMMARY
With ever growing host count (currently 180 ansible managed windows 2019/2022 servers), I am seeing more and more of these errors, breaking our deployment CI/CD pipeline:
I had already reported this here, in an issue with a similar problem that was successfully resolved thanks to @jborean93.
Would be great if it was possible to solve this one too.
ISSUE TYPE
COMPONENT NAME
Unsure
ANSIBLE VERSION
STEPS TO REPRODUCE
Execute any windows task on enough hosts and you will run into this.