dsccommunity / xPSDesiredStateConfiguration

DSC resources for configuring common operating systems features, files and settings.
https://dsccommunity.org
MIT License
209 stars 132 forks source link

Nodes not registering in Devices.edb and not pulling configurations #201

Open usrme opened 8 years ago

usrme commented 8 years ago

Greetings,

I've hit a wall with a perplexing issue: I am applying an identical LCM configuration to all my nodes and the LCM applies correctly with no error messages shown, but when I create a configuration, its checksum, and apply it via Update-DscConfiguration -ComputerName (node) -Wait -Verbose (just for testing purposes, automatic pulls within the default time frame don't work either) I get an error message:

The attempt to get the action from server 
https://mypullserver///PSDSCPullServer.svc/Nodes(AgentId='<AgentIDValue>')/GetDscAction 
failed because pullserver is not available or there is no registered node with AgentId <AgentIDValue> on the server.
+ CategoryInfo          : ResourceUnavailable: (root/Microsoft/...gurationManager:String) [], CimException
+ FullyQualifiedErrorId : WebDownloadManagerGetActionNodeConfigurationNotFound,Microsoft.PowerShell.DesiredStateConfigu 
ration.Commands.GetDscActionCommand
+ PSComputerName        : <node>

The pull server itself is healthy and active as I can browse to the PSDSCPullServer.svc endpoint and the Devices.edb file is being populated with RegistrationData and StatusReport entries, but the Devices table remains empty.

When looking at Event Viewer from the node's perspective I am seeing:

Http Client <AgentIDValue> failed for WebReportManager for configuration
The attempt to send status report to the server https://mypullserver/PSDSCPullServer.svc/Nodes(AgentId='<AgentIDValue>')/SendReport returned unexpected response code NotFound.

I have tried completely deleting the pull server and starting from scratch; changing the database provider to OleDb, which didn't even allow LCM initialization (gave errors about the node registration failing); using hard-coded values in scripts that generate .mof files; deleting resource documents via Remove-DscConfigurationDocument -CimSession (node) -Stage Current for Current, Pending, and Previous.

Any help towards solving this would be greatly appreciated; if necessary I can show the scripts and .mof files used and/or answer any questions. All target nodes have WMF5, as does the pull server, and no previous (WMF4) configurations have been applied (that means LCM as well). I am hoping for a solution that does not require manual intervention with target nodes. I know @kamranayub had a very similar problem in issue #110, but his solution did nothing for me.

kamranayub commented 8 years ago

One thing you can try that helped me diagnose my issue was enabling Failed Request Tracing in IIS for the endpoint and then pouring over the report. In there, it had the HTTP verb it was using which led me to my solution. I think your problem might be similar to mine where something in IIS is blocking the call, causing it to return a 404.

Are you using SSL? Have you tried removing SSL to isolate the issue? If you remove SSL you could then also take a network capture (Microsoft Message Analyzer is good) and see what it is being sent from the Node to the Pull Server.

usrme commented 8 years ago

I enabled Failed Request Tracing, but no log files appear in %SystemDrive%\inetpub\logs\FailedReqLogFiles as IIS shows; instead I checked _%SystemDrive%:\inetpub\logs\LogFiles\W3SVC1\uex160819.log and saw the following entries, and there were tons more like these:

2016-08-19 13:20:37 <pull server IP> POST /PSDSCPullServer.svc/Nodes(AgentId='<node1 AgentID>')/SendReport - 443 PSDSCUser <node1 IP> - - 404 0 0 171
2016-08-19 13:20:37 <pull server IP> POST /PSDSCPullServer.svc/Nodes(AgentId='<node1 AgentID>')/SendReport - 443 PSDSCUser <node1 IP> - - 404 0 0 0
2016-08-19 13:20:42 <pull server IP> POST /PSDSCPullServer.svc/Nodes(AgentId='<node2 AgentID>')/SendReport - 443 PSDSCUser <node2 IP> - - 404 0 0 31
2016-08-19 13:20:42 <pull server IP> POST /PSDSCPullServer.svc/Nodes(AgentId='<node2 AgentID>')/SendReport - 443 PSDSCUser <node2 IP> - - 404 0 0 15

Yes, I am using SSL, but as per your advice I disabled it and LCM initialization worked as before with no issues (none readily visible at least), but looking at Event Viewer from the node's perspective still shows a familiar error:

Http Client <AgentIDValue> failed for WebReportManager for configuration
The attempt to send status report to the server http://mypullserver/PSDSCPullServer.svc/Nodes(AgentId='<AgentIDValue>')/SendReport returned unexpected response code NotFound.

The same goes for manually asking a node to check its configuration via Update-DscConfiguration -ComputerName (node) -Wait -Verbose:

VERBOSE: [<node>]:                            [] Executing Get-Action with configuration 's checksum: .
VERBOSE: [<node>]:                            [] Executing Get-Action with configuration 's checksum failed. 
Please check the availability of pull server.
The attempt to get the action from server 
http://mypullserver///PSDSCPullServer.svc/Nodes(AgentId='<AgentIDValue>')/GetDscAction failed 
because pullserver is not available or there is no registered node with AgentId <AgentIDValue> on the server.
+ CategoryInfo          : ResourceUnavailable: (root/Microsoft/...gurationManager:String) [], CimException
+ FullyQualifiedErrorId : WebDownloadManagerGetActionNodeConfigurationNotFound,Microsoft.PowerShell.DesiredStateConfiguration.Commands.GetDscActionCommand
+ PSComputerName        : <node>

Here's what the _%SystemDrive%:\inetpub\logs\LogFiles\W3SVC1\uex160819.log log file shows:

2016-08-19 13:45:46 <pull server IP> PUT /PSDSCPullServer.svc/Nodes(AgentId='<node1 AgentID>') - 80 PSDSCUser <node1 IP> - - 204 0 0 10050
2016-08-19 13:45:56 <pull server IP> PUT /PSDSCPullServer.svc/Nodes(AgentId='<node1 AgentID>') - 80 PSDSCUser <node1 IP> - - 204 0 0 10046
2016-08-19 13:46:07 <pull server IP> POST /PSDSCPullServer.svc/Nodes(AgentId='<node1 AgentID>')/SendReport - 80 PSDSCUser <node1 IP> - - 404 0 0 10033
2016-08-19 13:46:43 <pull server IP> PUT /PSDSCPullServer.svc/Nodes(AgentId='<node2 AgentID>') - 80 PSDSCUser <node2 IP> - - 204 0 0 10050
2016-08-19 13:46:53 <pull server IP> PUT /PSDSCPullServer.svc/Nodes(AgentId='<node2 AgentID>') - 80 PSDSCUser <node2 IP> - - 204 0 0 10051
2016-08-19 13:47:03 <pull server IP> POST /PSDSCPullServer.svc/Nodes(AgentId='<node2 AgentID>')/SendReport - 80 PSDSCUser <node2 IP> - - 404 0 0 10033

I ran a network capture using Microsoft Message Analyzer for a bit and captured the 404 with the following payload:

{"odata.error":{"code":"AgentId {0} is not found in the list of registered agents.","message":{"lang":"en-US","value":"AgentId <node1 AgentID> is not found in the list of registered agents."},"innererror":{"message":"AgentId <node1 AgentID> is not found in the list of registered agents.","type":"System.ArgumentException","stacktrace":""},"MODATA.Exception.ErrorRecord":{"odata.type":"MODATA.Exception.DataServiceException","ErrorCode":"AgentId {0} is not found in the list of registered agents.","MessageLanguage":"en-US","StatusCode":404,"Message":"AgentId <node1 AgentID> is not found in the list of registered agents.","Data":[],"InnerException":{"Message":"AgentId <node1> is not found in the list of registered agents.","Data":[],"InnerException":null,"TargetSite":null,"StackTrace":null,"HelpLink":null,"Source":null,"HResult":-2147024809},"TargetSite":null,"StackTrace":"

At the moment it seems that SSL is not at fault here as the error messages are for the most part identical for both.

kamranayub commented 8 years ago

What's your PSDSC web.config look like? And could you try opening the ESENT database in a viewer like ESEDatabaseView?

I would just confirm the agent ID is actually registered in the DB.

usrme commented 8 years ago

I can post the web.config tomorrow, but in regards to the ESENT database I mentioned in my original post that both the RegistrationData and StatusReport tables are being populated with meaningful entries, but the Devices table isn't, which is where I would assume the node/agent ID entries would be for the nodes to be able to pull down configurations.

megamorf commented 8 years ago

I've run into the same issue. @grayzu responded to me on twitter regarding this problem. From what I know it's fixed in WMF 5.1 which isn't yet available for Server 2012 (R2).

usrme commented 8 years ago

Here is the web.config:

<?xml version="1.0" encoding="UTF-8"?>
<configuration>
  <configSections>
    <section name="managementOdata" type="Microsoft.Management.Odata.Core.DSConfiguration, Microsoft.Management.OData, Version=3.0.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35, processorArchitecture=MSIL" />
  </configSections>
  <managementOdata schemaFileName="PSDSCPullServer.mof" resourceMappingFileName="PSDSCPullServer.xml">
    <customAuthorization type="Microsoft.Powershell.DesiredStateConfiguration.PullServer.Authorization" assembly="Microsoft.Powershell.DesiredStateConfiguration.Service.dll" />
    <operationManager type="Microsoft.Powershell.DesiredStateConfiguration.PullServer.OperationManager" assembly="Microsoft.Powershell.DesiredStateConfiguration.Service.dll" />
    <quota userSchemaCacheTimeoutSec="600" />
    <commandInvocation enabled="false" />
    <wcfDataServicesConfig>
    </wcfDataServicesConfig>
  </managementOdata>
  <appSettings>
    <add key="MaxConcurrentRequests" value="10000" />
    <add key="MaxRequestsPerTimeslot" value="10000" />
    <add key="TimeslotSize" value="1" />
    <add key="dbprovider" value="ESENT" />
    <add key="dbconnectionstr" value="C:\Program Files\WindowsPowerShell\DscService\Devices.edb" />
    <add key="ConfigurationPath" value="C:\DSC\Configuration" />
    <add key="ModulePath" value="C:\DSC\Modules" />
    <add key="RegistrationKeyPath" value="C:\DSC" />
  </appSettings>
  <system.web>
    <compilation debug="false" targetFramework="4.0" />
  </system.web>
  <system.serviceModel>
    <behaviors>
      <serviceBehaviors>
        <behavior>
          <serviceAuthorization serviceAuthorizationManagerType="Microsoft.Management.Odata.Core.CustomAuthorizationManager, Microsoft.Management.OData, Version=3.0.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35" />
          <serviceDebug includeExceptionDetailInFaults="true" />
          <serviceMetadata httpGetEnabled="true" />
        </behavior>
      </serviceBehaviors>
    </behaviors>
    <serviceHostingEnvironment aspNetCompatibilityEnabled="true" />
  </system.serviceModel>
  <system.webServer>
    <modules>
      <remove name="ServiceModel" />
      <remove name="WebDAVModule" />
      <remove name="AuthenticationModule" />
      <add type="Microsoft.Powershell.DesiredStateConfiguration.PullServer.AuthenticationPlugin, Microsoft.Powershell.DesiredStateConfiguration.Service" name="AuthenticationModule" />
            <add name="IISSelfSignedCertModule(32bit)" />
    </modules>
    <handlers>
      <remove name="WebDAV" />
      <remove name="xoml-Integrated" />
      <remove name="rules-Integrated" />
      <remove name="svc-ISAPI-2.0-64" />
      <remove name="svc-ISAPI-2.0" />
      <remove name="svc-Integrated" />
    </handlers>
    <security>
      <authentication>
        <anonymousAuthentication enabled="true" />
        <basicAuthentication enabled="false" />
        <windowsAuthentication enabled="false" />
      </authentication>
    </security>
    <staticContent>
      <clientCache cacheControlMode="UseMaxAge" cacheControlMaxAge="1.00:00:00" />
    </staticContent>
    <directoryBrowse enabled="false" />
  </system.webServer>
  <runtime>
    <assemblyBinding xmlns="urn:schemas-microsoft-com:asm.v1">
      <dependentAssembly>
        <assemblyIdentity name="microsoft.isam.esent.interop" publicKeyToken="31bf3856ad364e35" />
        <bindingRedirect oldVersion="10.0.0.0" newVersion="6.3.0.0" />
      </dependentAssembly>
    </assemblyBinding>
  </runtime>
</configuration>

@megamorf Do you have any more details on the issue? Also, if you know, when is WMF 5.1 coming?

usrme commented 8 years ago

An update: a majority of the hosts get registered under the Devices table in Devices.edb, but even those that are registered there are not able to pull down configurations reliably; 3/7 hosts work as expected and 6/7 are actually registered.

Funnily enough, one of the hosts, which does pull down configurations, is not registered under the Devices table...

Any updates from the DSC team would be much appreciated!

kamranayub commented 8 years ago

So I spoke too soon because I'm running into this again (same exact error) with nodes that used to work in July, and haven't now since Jul 8. I can also get them to register successfully but none of them will pull down configurations :-1:

Also I'm not sure this issue is the same one--since the error it references is "Instance name already in use"

update: Looks like I am seeing the same thing where one node will work and others won't. Even though it says they successfully registered with the pull server. I am able to get the nodes working if I re-register multiple times (one at a time).

update 2: ALRIGHT. I have it working now with OleDb provider to avoid concurrency issues with ESENT and I'll keep it that way until WMF 5.1 or I switch to Azure Automation, whichever comes first.

  1. Update web.config dbprovider and dbconnectionstr:

    <add key="dbprovider" value="System.Data.OleDb" />
    <add key="dbconnectionstr" value="Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\Program Files\WindowsPowerShell\DscService\Devices.mdb;" />
  2. Copy over fresh Devices.mdb from: C:\Windows\System32\WindowsPowerShell\v1.0\Modules\PSDesiredStateConfiguration\PullServer\Devices.mdb to C:\Program Files\WindowsPowerShell\DscService\Devices.mdb
  3. Restart app pool(s)
  4. Re-register all nodes

I was able to pull configurations perfectly fine consistently across all my nodes.

peterschen commented 8 years ago

I am seeing the same issues on build 14300.1045. When updating the dbconnectionstr the registration and subsequent pulls work fine.

jr-ge commented 8 years ago

@kamranayub Your solution of switching to the OleDb provider worked great for me. I'm definitely going to have to do this for any new pull server I make from now on.

Can this feature be added: to choose which Db provider to use?

I believe the current Pull Server config automatically chooses based on your OS. Would be great to be able to just set this at creation.

I also like the OleDb provider better because it doesn't fill up the DSCService directory with logs like ESENT does

kamranayub commented 8 years ago

Yeah, it definitely feels like the Esent provider was hastily added in newest releases without proper production workloads and testing. I am using the OleDb provider on a high availability fileshare with two load balanced pull servers and it's rocking without issues.

On Tue, Sep 13, 2016, 14:01 Jordan Rogers notifications@github.com wrote:

@kamranayub https://github.com/kamranayub Your solution of switching to the OleDb provider worked great for me. I'm definitely going to have to do this for any new pull server I make from now on.

Can this feature be added: to choose which Db provider to use?

I believe the current Pull Server config automatically chooses based on your OS. Would be great to be able to just set this at creation.

I also like the OleDb provider better because it doesn't fill up the DSCService directory with logs like ESENT does

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/PowerShell/xPSDesiredStateConfiguration/issues/201#issuecomment-246788284, or mute the thread https://github.com/notifications/unsubscribe-auth/AAiaa1-WwPdstyGbZZSkgbEVEz5gSxT_ks5qpvLygaJpZM4JoXY_ .

usrme commented 8 years ago

I managed to get the registration working with ESENT through numerous setups of the pull server; deleting all possible traces of DSC from nodes, including deleting registry keys for agent IDs etc. When I first tried to switch to OleDb I got initialization errors when trying to register nodes to it, but I didn't follow the steps you laid out in your second update, @kamranayub, so I will have to see whether or not I can switch next time around.

At the moment the nodes are registered properly, are checking and pulling configurations as expected, so I see no pressing reason to switch to OleDb right now. I have run into issues where the nodes are in "consistency check"-mode for an inordinate amount of time, but that seems irrelevant to the issue at hand.

On a side-note I would love to see a more technical documentation on DSC as I've learned more about the bits and bobs of DSC from reading the comments to issues and bugs than I have from the official MSDN.

ArieHein commented 8 years ago

I've seen issues like this on the transition from WMF5 Preview to RTM. This was later fixed on the second release of RTM together with new xPSDSC. if you search uservoice /powershell.org and iirc even this repo you'll see an issue back then with installing on Server Core that had no UI would fail as OLEDB had no GUI dlls to depend on. I haven't seen issues with it on my boxes since.

WMF 5.1 should address ESENT issues as I remember some of the DSC folks dealing with earlier reported bugs, had fixes for it.

All that said, I dont think OLEDB is the right way, its more of a backward compatibility as it cant run on nano server due to dependencies on GUI elements of Win32. Not that I'm a big fan of ESNET but the pull server needs a DB and there aren't many options available. Registry is obviously not a solution as its only for Windows, perhaps with SQL on Linux, there might be a possible switch.

Its true that using OLEDB means we can query the DB better with lots of tools and methods, that almost do not exist in the ESENT case, but I dont think were spouse to be doing it in the first place. Anyone who dealt with databases knows that the best way to query the DB is the direct api (and you get that through the reporting server and REST api), and not something behind the scenes as you're risking locking issues and then performance drop.

Theres still a desire for the pull server to go open source, which was already raised on uservoice and somewhat been denied by MS due to reliability concerns (which are partly true). If the schema was open, I would even consider using a JSON based NOSQL solution instead.

I dont think allowing us to choose a DB provider is a good thing as it would also increase the complexity and amount of tests to support the various providers. my 0.5 cent ;)

kamranayub commented 8 years ago

Are they using esent for Azure Automation though? I doubt it... They must be using something more robust. I'd love a SQL provider, for example. It would make it more complex, but at least it would be better for HA scenarios (which you'd want in production).

On Thu, Sep 15, 2016, 02:14 Arie Heinrich notifications@github.com wrote:

I've seen issues like this on the transition from WMF5 Preview to RTM. This was later fixed on the second release of RTM together with new xPSDSC. if you search uservoice /powershell.org and iirc even this repo you'll see an issue back then with installing on Server Core that had no UI would fail as OLEDB had no GUI dlls to depend on. I haven't seen issues with it on my boxes since.

WMF 5.1 should address ESENT issues as I remember some of the DSC folks dealing with earlier reported bugs, had fixes for it.

All that said, I dont think OLEDB is the right way, its more of a backward compatibility as it cant run on nano server due to dependencies on GUI elements of Win32. Not that I'm a big fan of ESNET but the pull server needs a DB and there aren't many options available. Registry is obviously not a solution as its only for Windows, perhaps with SQL on Linux, there might be a possible switch.

Its true that using OLEDB means we can query the DB better with lots of tools and methods, that almost do not exist in the ESENT case, but I dont think were spouse to be doing it in the first place. Anyone who dealt with databases knows that the best way to query the DB is the direct api (and you get that through the reporting server and REST api), and not something behind the scenes as you're risking locking issues and then performance drop.

Theres still a desire for the pull server to go open source, which was already raised on uservoice and somewhat been denied by MS due to reliability concerns (which are partly true). If the schema was open, I would even consider using a JSON based NOSQL solution instead.

I dont think allowing us to choose a DB provider is a good thing as it would also increase the complexity and amount of tests to support the various providers. my 0.5 cent ;)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/PowerShell/xPSDesiredStateConfiguration/issues/201#issuecomment-247254139, or mute the thread https://github.com/notifications/unsubscribe-auth/AAiaa31ztNJG5CVD50LLfwajwO-c7sFtks5qqPBvgaJpZM4JoXY_ .

ArieHein commented 8 years ago

The "PullServer" on Azure DSC works differently and I think its safe to assume it uses a different meta data store. I wouldn't be surprised if its using a JSON based NoSQL (maybe via DocumentDB) solution.

MisterTK commented 7 years ago

@kamranayub Switching to the OleDb provider solved my pull server issues, thank you! I was beating my head against the wall. I now have pull server with partial configs working.

dotps1 commented 7 years ago

I have updated my web.config to use the .mdb, and when I try to use Set-DscLocalConfigruationManager I get this error:

Registration of the Dsc Agent with the server https://mypullserver:8080/PSDSCPullServer.svc failed. The underlying error
is: The attempt to register Dsc Agent with AgentId 7B44FB6F-2119-11E7-B5DA-9CB6D0DE32BC with the server
https://mypullserver:8080/PSDSCPullServer.svc/Nodes(AgentId='7B44FB6F-2119-11E7-B5DA-9CB6D0DE32BC') returned unexpected
response code InternalServerError. .

This agent was already regerseted, so is there something else I need to do to "re-reregister"? thanks for any help!

kamranayub commented 7 years ago

You shouldn't need to. What helps diagnose these the best is enabling Failed Request Tracing on your IIS site and looking at that. It will usually include the OData error message or other information to hint at issues. In one case, the source node had updated to WMF 5.1 and the Pull Server was WMF 5.0 so they were incompatible. In another case, my web server had disabled HTTP PUT/DELETE verbs so the registration wasn't working. Good luck!

On Tue, Apr 18, 2017 at 10:43 AM Thomas Malkewitz notifications@github.com wrote:

I have updated my web.config to use the .mdb, and when I try to use Set-DscLocalConfigruationManager I get this error:

Registration of the Dsc Agent with the server https://mypullserver:8080/PSDSCPullServer.svc failed. The underlying error is: The attempt to register Dsc Agent with AgentId 7B44FB6F-2119-11E7-B5DA-9CB6D0DE32BC with the serverhttps://mypullserver:8080/PSDSCPullServer.svc/Nodes(AgentId='7B44FB6F-2119-11E7-B5DA-9CB6D0DE32BC') returned unexpected response code InternalServerError. .

This agent was already regerseted, so is there something else I need to do to "re-reregister"? thanks for any help!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/PowerShell/xPSDesiredStateConfiguration/issues/201#issuecomment-294886297, or mute the thread https://github.com/notifications/unsubscribe-auth/AAiaaxL8yNT-pYTqFGcVI-ddgxZykrrHks5rxNoJgaJpZM4JoXY_ .

dotps1 commented 7 years ago

More just adding this incase someone else is following along. When I reset the apppools I if notice this in the EventVwr:

w3wp (3408) CreateRepositoryInstance: An attempt to create the folder "C:\Windows\SysWoW64\inetsrv\Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\" failed with system error 123 (0x0000007b): "The filename, directory name, or volume label syntax is incorrect. ".  The create folder operation will fail with error -1022 (0xfffffc02).

I am running on core 2016 so maybe that is an issue? missing assembly or something.