Closed lkt82 closed 4 years ago
hi @lkt82, thanks for reporting this. This is a similar issue to #574, and we added an additional exception to the try/catch block around Get-AdDomain
for System.InvalidOperationException
in PR #577 to solve this. This was released in v6.0.0 of the module. Are you a !00% sure you are using v6.0.0 of the ActiveDirectoryDsc module, not one of the pre-release versions?
I have just tried to reprovision a new node. Waited until DSC was completed and the ran a manuel reboot. The node reports the same error as described.
output from get module on the node
Get-Module -ListAvailable -Name ActiveDirectoryDsc
Directory: C:\Program Files\WindowsPowerShell\Modules
ModuleType Version Name ExportedCommands
---------- ------- ---- ----------------
Manifest 6.0.0 ActiveDirectoryDsc Find-DomainController
@lkt82 Can you add verbose messages too to better see what is happening. See this https://github.com/dsccommunity/ActiveDirectoryDsc/issues/574#issuecomment-588420838, but the added code should be before line 113, as the first thing in the catch-block.
@johlju and @X-Guardian -- I just ran into this when trying to provision a brand new domain/DC in Azure. I triple-checked that I was using the most current ActiveDirectoryDSC (6.0.0) and am still stuck. The DSC gets down to the VM fine, the domain build happens, but the "domain not found" exception is triggered after the first reboot, and kills my ARM template deployment.
One thing I did find is that in MSFT_ADDomain.ps1, starting at the do loop on Line 96...whatever exception is being triggered doesn't fall under the 3 types specifically called out. Instead it triggers the generic catch, calls "New-InvalidOperationException" -- so it never falls into the retry behavior that I was used to in previous versions of this DSC module. (Azure DCs would always take a few extra minutes to come up after the first install, and the code was able to handle that by retrying several times.) It looks like whatever I'm hitting triggers a System.ArgumentException.
I'm attaching the log showing the verbose output of the error -- it's the same one everyone else is getting ("Server instance not found on the given port") DscExtensionHandler.0.20200405-201642.txt
Hi @erictorbenson. Can you add the following verbose messages at line 113 of MSFT_ADDomain.psm1
(within the GetAdDomainUnexpectedError
catch block) so that we can see full details of the exception being thrown.
Write-Verbose -Message "ErrorRecord Details: $($_|fl -Force|Out-String)" -Verbose
Write-Verbose -Message "Exception Details: $($_.Exception|fl -Force|Out-String)" -Verbose
Write-Verbose -Message "Inner Exception Details: $($_.Exception.InnerException|fl -Force|Out-String)" -Verbose
Write-Verbose ('Exception Name: ' + $_.Exception.GetType().FullName) -Verbose
Write-Verbose ('Inner Exception Name: ' + $_.Exception.InnerException.GetType().FullName) -Verbose
Hi @erictorbenson. Can you add the following verbose messages at line 113 of
MSFT_ADDomain.psm1
(within theGetAdDomainUnexpectedError
catch block) so that we can see full details of the exception being thrown.
Done! (And I learned that Publish-AzVMDSCConfiguration
will download the latest dependencies instead of using what you have in the folder...after running through a whole build. :-)
File is attached. Interestingly I found a positional parameter error: "[ERROR] A positional parameter cannot be found that accepts argument '+'." (I replaced the actual domain name with "redactedfqdn")
Any help would be appreciated...everything in the DSC resource looks OK so it might be something it's calling externally?
@erictorbenson it was a bug in the snippet that @X-Guardian provided above that you hit instead of the actual error we are looking for. It didn't like the +
sign in the last two rows :/. Sorry, but can you change to the following snippet and run again? 🙏
I verified that this snippet below will output what we are looking for. Last two rows has parentheses now:
Write-Verbose -Message "ErrorRecord Details: $($_|fl -Force|Out-String)" -Verbose
Write-Verbose -Message "Exception Details: $($_.Exception|fl -Force|Out-String)" -Verbose
Write-Verbose -Message "Inner Exception Details: $($_.Exception.InnerException|fl -Force|Out-String)" -Verbose
Write-Verbose ('Exception Name: ' + $_.Exception.GetType().FullName) -Verbose
Write-Verbose ('Inner Exception Name: ' + $_.Exception.InnerException.GetType().FullName) -Verbose
Sorry, but can you change to the following snippet and run again? 🙏
No problem, I'm happy that someone is able to look at this because it's holding up a project for me!
I do have another clue to contribute though...it took me 3 tries to reproduce this. It has to have something to do with timing because successful attempts worked flawlessly. Maybe it's a startup-order thing where DSC is starting before the basic AD services? It takes a good 4 or 5 minutes for new DCs to start in Azure (and on prem too) after the first reboot. Sorry about the vague issue reporting...I'm new to DSC and everything I've done so far with it has worked pretty well, so I haven't had time to dig into the internals.
Thanks @erictorbenson, It looks like we need to add System.ArgumentException
to the list of retry exceptions in the resource.
If you want to prove this yourself, you just need to add [System.ArgumentException]
to the list of exceptions in the catch
statement at line 103 of the MSFT_ADDomain.psm1
file in the module.
This will soon be published as a preview release. As soon as the pipeline finishes running.
Update for anyone who finds this later via search...Adding the exception to the list worked! It only needed the one retry.
One thing I did notice is that everything is timing after the first reboot. After I moved on from this, other DSC configuration resources I had built (creating reverse lookup zone in DNS, enabling AD Recycle Bin, etc.) would also fail with similar "can't find the domain" errors. If you actually watch the DC build in Azure, it'll take several minutes before the "configuring settings" spinning-dots screen goes away. I assume all of the AD cmdlets need something that initializes later.
A "workaround" that let me move on is a simple delay that all the future resources depend on -- in my case it took a 5 minute delay before everything worked correctly. (Obviously this could be a lot more refined, but it does work.)
Script WaitForDCToSettleDown {
DependsOn = "[ADDomain]ADDomain"
SetScript = {Start-Sleep -Seconds 300}
GetScript = {write-host "Get"}
TestScript = {$false}
}
@erictorbenson It doesn't work having the WaitForADDomain
that waits for the domain, you need the above Script
resource on top of the below? 🤔
Or do you mean that WaitForADDomain
fails like the others because of a similar issue (or the same) as this issue discusses?
WaitForADDomain Ready
{
DomainName = $DomainName
Credential = $AdminCreds
DependsOn = "[PendingReboot]Reboot"
}
Actually I forgot about WaitForADDomain. :-) The first thing that came into my head to fix the problem fast was a Script that sleeps.
I just redid everything and WaitForADDomain works, BUT, I did have to increase the timeout to 300 seconds, otherwise DSC reboots the DC before it gets a chance to respond and you run the risk of never initializing the DC the whole way before the restarts expire.
Here's what the DC build part of my DSC config looks like now...this looks like a solid way to ensure we don't time out on any of the other AD-dependent elements. (The next element in the config needs to depend on [WaitForADDomain]FirstBoot
.)
Thanks @johlju @X-Guardian for the fast help for a relative DSC newbie. Hopefully I can contribute at some point.
ADDomain ADDomain {
Credential = $DomainCreds
SafeModeAdministratorPassword = $DsrmCreds
DomainNetBiosName = $netbiosDomainName
DomainName = $DomainName
ForestMode = 'WinThreshold'
DomainMode = 'WinThreshold'
DatabasePath = $ntdsDBPath
LogPath = $ntdsLogPath
SysvolPath = $sysvolPath
DependsOn = @("[WindowsFeature]AD-Domain-Services", "[WindowsFeature]DNSServer", "[Disk]DataDisk")
}
WaitForADDomain FirstBoot {
DomainName = $DomainName
WaitForValidCredentials = $true
Credential = $domainAdminCredentials
PsDscRunAsCredential = $domainAdminCredentials
WaitTimeout = 300
RestartCount = 3
DependsOn = "[ADDomain]ADDomain"
}
Details of the scenario you tried and the problem that is occurring
After rebooting a node the state is reported as failed. The problem is that the ADDomain resource fails with "Server instance not found".
Eventually the node will reach the state of Compliant
Verbose logs showing the problem
Suggested solution to the issue
Detect that the node is starting and wait for dependencies to be ready
The DSC configuration that is used to reproduce the issue (as detailed as possible)
The operating system the target node is running
Version and build of PowerShell the target node is running
Version of the DSC module that was used