Graylog2 / collector-sidecar

Manage log collectors through Graylog
https://www.graylog.org/
Other
268 stars 56 forks source link

Collectors showing "failing" state although they are working as supposed. #181

Closed kb-elmo closed 7 years ago

kb-elmo commented 7 years ago

Problem description

Collectors randomly show the error: "Nxlog: Could not start service: An instance of the service is already running." There is no standalone instance of NXLog running and the logs are received by Graylog. It's just annoying that the collectors show the "failing" icon in the overview.

Steps to reproduce the problem

  1. Install Graylog Collector Sidecar with NXLog backend on Windows (Version doesn't matter. 7, 8, 10 create the same problem)
  2. Collectors will randomly show the "failing" state in the overview.

Environment

logserver

mariussturm commented 7 years ago

Hi, could you please provide the log output of one of the failing nxlog instances as well as the sidecar logs itself?

kb-elmo commented 7 years ago

NXLog:

2017-07-03 00:00:00 INFO LogFile C:\Program Files (x86)\nxlog\data\nxlog.log reopened

Sidecar:

time="2017-06-30T15:26:03+02:00" level=info msg="Stopping signal distributor"
time="2017-06-30T15:26:03+02:00" level=info msg="[nxlog] Stopping"
time="2017-06-30T15:26:03+02:00" level=error msg="[nxlog] Failed to connect to service manager: A system shutdown is in progress."
time="2017-06-30T15:26:03+02:00" level=error msg="[nxlog] Failed to connect to service manager: A system shutdown is in progress."
time="2017-06-30T15:30:33+02:00" level=info msg="Starting signal distributor"
time="2017-06-30T15:30:33+02:00" level=info msg="[nxlog] Stopping"
time="2017-06-30T15:30:33+02:00" level=error msg="[nxlog] Could not send stop control: The service has not been started."
time="2017-06-30T15:30:43+02:00" level=info msg="[nxlog] Configuration change detected, rewriting configuration file."
time="2017-06-30T15:31:02+02:00" level=info msg="[nxlog] Starting (svc driver)"
time="2017-06-30T15:31:02+02:00" level=info msg="[nxlog] Stopping"
time="2017-06-30T15:31:02+02:00" level=error msg="[nxlog] Could not send stop control: The requested control is not valid for this service."
time="2017-06-30T15:31:02+02:00" level=info msg="[nxlog] Starting (svc driver)"
time="2017-06-30T15:31:02+02:00" level=error msg="[nxlog] Could not start service: An instance of the service is already running."
time="2017-06-30T15:40:37+02:00" level=info msg="Stopping signal distributor"
time="2017-06-30T15:40:37+02:00" level=info msg="[nxlog] Stopping"
time="2017-06-30T15:40:37+02:00" level=error msg="[nxlog] Failed to connect to service manager: A system shutdown is in progress."
time="2017-06-30T15:40:37+02:00" level=error msg="[nxlog] Failed to connect to service manager: A system shutdown is in progress."
time="2017-06-30T15:47:13+02:00" level=info msg="Starting signal distributor"
time="2017-06-30T15:47:14+02:00" level=info msg="[nxlog] Stopping"
time="2017-06-30T15:47:14+02:00" level=error msg="[nxlog] Could not send stop control: The service has not been started."
time="2017-06-30T15:47:25+02:00" level=info msg="[nxlog] Configuration change detected, rewriting configuration file."
time="2017-06-30T15:47:44+02:00" level=info msg="[nxlog] Starting (svc driver)"
time="2017-06-30T15:47:44+02:00" level=info msg="[nxlog] Stopping"
time="2017-06-30T15:47:44+02:00" level=error msg="[nxlog] Could not send stop control: The requested control is not valid for this service."
time="2017-06-30T15:47:44+02:00" level=info msg="[nxlog] Starting (svc driver)"
time="2017-06-30T15:47:44+02:00" level=error msg="[nxlog] Could not start service: An instance of the service is already running."
time="2017-06-30T19:13:18+02:00" level=error msg="[UpdateRegistration] Failed to report collector status to server: Put https://<server-url>/api/plugins/org.graylog.plugins.collector/collectors/fc6ce39a-3b7c-4a6c-823c-526f7eb4d2d5: read tcp <client-ip>:49301-><server-ip>:443: wsarecv: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond."
time="2017-06-30T19:13:20+02:00" level=error msg="[RequestConfiguration] Fetching configuration failed: Get https://<server-url>/api/plugins/org.graylog.plugins.collector/fc6ce39a-3b7c-4a6c-823c-526f7eb4d2d5?tags=%5B%22updateserver%22%5D: dial tcp <server-ip>:443: connectex: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond."

It looks like the collector is trying to restart and fails to stop the service in the first instance and then tries to start a new one while the old one is still running.

mariussturm commented 7 years ago

Yeah that looks odd, could please give me a list of the registered services on this host? PS> Get-Service or a screenshot of the server manager's Services tab?

kb-elmo commented 7 years ago
Status   Name               DisplayName                           
------   ----               -----------                           
Running  AdobeARMservice    Adobe Acrobat Update Service          
Stopped  AeLookupSvc        Application Experience                
Stopped  ALG                Application Layer Gateway Service     
Running  AppHostSvc         Application Host Helper Service       
Stopped  AppIDSvc           Application Identity                  
Running  Appinfo            Application Information               
Running  AppMgmt            Application Management                
Stopped  AppReadiness       App Readiness                         
Stopped  AppXSvc            AppX Deployment Service (AppXSVC)     
Stopped  aspnet_state       ASP.NET State Service                 
Stopped  AudioEndpointBu... Windows Audio Endpoint Builder        
Stopped  Audiosrv           Windows Audio                         
Running  BFE                Base Filtering Engine                 
Running  BITS               Background Intelligent Transfer Ser...
Running  BrokerInfrastru... Background Tasks Infrastructure Ser...
Stopped  Browser            Computer Browser                      
Running  CertPropSvc        Certificate Propagation               
Running  collector-sidecar  Graylog collector sidecar             
Stopped  COMSysApp          COM+ System Application               
Running  CryptSvc           Cryptographic Services                
Running  DcomLaunch         DCOM Server Process Launcher          
Stopped  defragsvc          Optimize drives                       
Stopped  DeviceAssociati... Device Association Service            
Stopped  DeviceInstall      Device Install Service                
Running  Dhcp               DHCP Client                           
Running  DiagTrack          Diagnostics Tracking Service          
Running  Dnscache           DNS Client                            
Stopped  dot3svc            Wired AutoConfig                      
Running  DPS                Diagnostic Policy Service             
Stopped  DsmSvc             Device Setup Manager                  
Stopped  Eaphost            Extensible Authentication Protocol    
Stopped  EFS                Encrypting File System (EFS)          
Running  EventLog           Windows Event Log                     
Running  EventSystem        COM+ Event System                     
Stopped  fdPHost            Function Discovery Provider Host      
Stopped  FDResPub           Function Discovery Resource Publica...
Running  FontCache          Windows Font Cache Service            
Stopped  FontCache3.0.0.0   Windows Presentation Foundation Fon...
Running  gpsvc              Group Policy Client                   
Running  graylog-collect... Graylog collector sidecar - nxlog b...
Stopped  hidserv            Human Interface Device Service        
Stopped  hkmsvc             Health Key and Certificate Management 
Running  IAStorDataMgrSvc   Intel(R) Rapid Storage Technology     
Stopped  IEEtwCollectorS... Internet Explorer ETW Collector Ser...
Running  IISADMIN           IIS Admin Service                     
Stopped  IKEEXT             IKE and AuthIP IPsec Keying Modules   
Running  iphlpsvc           IP Helper                             
Stopped  KeyIso             CNG Key Isolation                     
Stopped  KPSSVC             KDC Proxy Server service (KPS)        
Stopped  KtmRm              KtmRm for Distributed Transaction C...
Running  LanmanServer       Server                                
Running  LanmanWorkstation  Workstation                           
Stopped  lltdsvc            Link-Layer Topology Discovery Mapper  
Running  lmhosts            TCP/IP NetBIOS Helper                 
Running  LSM                Local Session Manager                 
Stopped  MMCSS              Multimedia Class Scheduler            
Stopped  MozillaMaintenance Mozilla Maintenance Service           
Running  MpsSvc             Windows Firewall                      
Running  MSDTC              Distributed Transaction Coordinator   
Stopped  MSiSCSI            Microsoft iSCSI Initiator Service     
Stopped  msiserver          Windows Installer                     
Running  MSSQL$MICROSOFT... Windows Internal Database             
Stopped  napagent           Network Access Protection Agent       
Stopped  NcaSvc             Network Connectivity Assistant        
Stopped  Netlogon           Netlogon                              
Stopped  Netman             Network Connections                   
Running  netprofm           Network List Service                  
Stopped  NetTcpPortSharing  Net.Tcp Port Sharing Service          
Running  NlaSvc             Network Location Awareness            
Running  nscp               NSClient++ (x64)                      
Running  nsi                Network Store Interface Service       
Running  NTP                Network Time Protocol Daemon          
Running  nvsvc              NVIDIA Display Driver Service         
Stopped  PerfHost           Performance Counter DLL Host          
Stopped  pla                Performance Logs & Alerts             
Running  PlugPlay           Plug and Play                         
Running  PolicyAgent        IPsec Policy Agent                    
Running  Power              Power                                 
Stopped  PrintNotify        Printer Extensions and Notifications  
Running  ProfSvc            User Profile Service                  
Stopped  RasAuto            Remote Access Auto Connection Manager 
Stopped  RasMan             Remote Access Connection Manager      
Running  RCCMD              RCCMD                                 
Running  rccmdWebIf         rccmdWebIf                            
Stopped  RemoteAccess       Routing and Remote Access             
Stopped  RemoteRegistry     Remote Registry                       
Running  RpcEptMapper       RPC Endpoint Mapper                   
Stopped  RpcLocator         Remote Procedure Call (RPC) Locator   
Running  RpcSs              Remote Procedure Call (RPC)           
Stopped  RSoPProv           Resultant Set of Policy Provider      
Stopped  sacsvr             Special Administration Console Helper 
Running  SamSs              Security Accounts Manager             
Running  SAVAdminService    Sophos Anti-Virus Statusreporter      
Running  SAVService         Sophos Anti-Virus                     
Stopped  SCardSvr           Smart Card                            
Running  ScDeviceEnum       Smart Card Device Enumeration Service 
Running  Schedule           Task Scheduler                        
Stopped  SCPolicySvc        Smart Card Removal Policy             
Stopped  seclogon           Secondary Logon                       
Running  SENS               System Event Notification Service     
Running  SessionEnv         Remote Desktop Configuration          
Stopped  SharedAccess       Internet Connection Sharing (ICS)     
Running  ShellHWDetection   Shell Hardware Detection              
Stopped  smphost            Microsoft Storage Spaces SMP          
Stopped  SNMPTRAP           SNMP Trap                             
Running  Sophos AutoUpda... Sophos AutoUpdate Service             
Running  Sophos Web Cont... Sophos Web Control Service            
Running  sophossps          Sophos System Protection Service      
Running  Spooler            Print Spooler                         
Stopped  sppsvc             Software Protection                   
Stopped  SSDPSRV            SSDP Discovery                        
Stopped  SstpSvc            Secure Socket Tunneling Protocol Se...
Stopped  svsvc              Spot Verifier                         
Running  swi_filter         Sophos Web Filter                     
Running  swi_service        Sophos Web Intelligence Service       
Stopped  swprv              Microsoft Software Shadow Copy Prov...
Stopped  SysMain            Superfetch                            
Running  SystemEventsBroker System Events Broker                  
Stopped  TapiSrv            Telephony                             
Running  TermService        Remote Desktop Services               
Running  Themes             Themes                                
Stopped  THREADORDER        Thread Ordering Server                
Stopped  TieringEngineSe... Storage Tiers Management              
Running  TrkWks             Distributed Link Tracking Client      
Stopped  TrustedInstaller   Windows Modules Installer             
Running  UALSVC             User Access Logging Service           
Stopped  UI0Detect          Interactive Services Detection        
Running  UmRdpService       Remote Desktop Services UserMode Po...
Stopped  upnphost           UPnP Device Host                      
Stopped  VaultSvc           Credential Manager                    
Stopped  vds                Virtual Disk                          
Stopped  vmicguestinterface Hyper-V Guest Service Interface       
Stopped  vmicheartbeat      Hyper-V Heartbeat Service             
Stopped  vmickvpexchange    Hyper-V Data Exchange Service         
Stopped  vmicrdv            Hyper-V Remote Desktop Virtualizati...
Stopped  vmicshutdown       Hyper-V Guest Shutdown Service        
Stopped  vmictimesync       Hyper-V Time Synchronization Service  
Stopped  vmicvss            Hyper-V Volume Shadow Copy Requestor  
Stopped  VSS                Volume Shadow Copy                    
Stopped  W32Time            Windows Time                          
Stopped  w3logsvc           W3C Logging Service                   
Running  W3SVC              World Wide Web Publishing Service     
Running  WAS                Windows Process Activation Service    
Running  Wcmsvc             Windows Connection Manager            
Stopped  WcsPlugInService   Windows Color System                  
Stopped  WdiServiceHost     Diagnostic Service Host               
Stopped  WdiSystemHost      Diagnostic System Host                
Stopped  Wecsvc             Windows Event Collector               
Stopped  WEPHOSTSVC         Windows Encryption Provider Host Se...
Stopped  wercplsupport      Problem Reports and Solutions Contr...
Stopped  WerSvc             Windows Error Reporting Service       
Running  WIDWriter          Windows Internal Database VSS Writer  
Running  WinHttpAutoProx... WinHTTP Web Proxy Auto-Discovery Se...
Running  Winmgmt            Windows Management Instrumentation    
Running  WinRM              Windows Remote Management (WS-Manag...
Stopped  wmiApSrv           WMI Performance Adapter               
Stopped  WPDBusEnum         Portable Device Enumerator Service    
Stopped  WSService          Windows Store Service (WSService)     
Stopped  WSusCertServer     WSUS Certificate Server               
Running  WsusService        WSUS Service                          
Stopped  wuauserv           Windows Update                        
Stopped  wudfsvc            Windows Driver Foundation - User-mo...
mariussturm commented 7 years ago

That looks all good, no second nxlog service. Is there something more in the logs, like the Graylog server log or the NXlog (\Program Files (x86)\nxlog\data\nxog.log)? Otherwise I need a way to reproduce the issue. So I need the configurations and steps to make until the error comes up on a fresh installed machine? The given informations doesn't give a clear picture to me.

kb-elmo commented 7 years ago

the nxlog.log file only contains the one line i posted already. The generated nxlog conf look as follows (some infos are hidden due to company policies):

define ROOT C:\Program Files (x86)\nxlog

<Extension gelf>
  Module xm_gelf
</Extension>

Moduledir %ROOT%\modules
CacheDir %ROOT%\data
Pidfile %ROOT%\data\nxlog.pid
SpoolDir %ROOT%\data
LogFile %ROOT%\data\nxlog.log
LogLevel INFO

<Extension logrotate>
    Module  xm_fileop
    <Schedule>
        When    @daily
        Exec    file_cycle('%ROOT%\data\nxlog.log', 7);
     </Schedule>
</Extension>

<Input 59393d5bfb979b6549b31c0f>
    Module im_msvistalog
    PollInterval 1
    SavePos True
    ReadFromLast True
    Query <QueryList>\
    <Query Id="0">\
        <Select Path="Application">*[System/Level&lt;=2]</Select>\
        <Select Path="System">*[System/Level&lt;=3]</Select>\
        <Select Path="Security">*</Select>\
    </Query>\
</QueryList>
</Input>

<Output 59393d4ffb979b6549b31c00>
    Module om_ssl
    Host <server-url>
    Port 12204
    OutputType GELF_TCP
    AllowUntrusted True
    Exec $short_message = $raw_event; # Avoids truncation of the short_message field.
    Exec $gl2_source_collector = 'fc6ce39a-3b7c-4a6c-823c-526f7eb4d2d5';
    Exec $collector_node_id = '<hostname>';
    Exec $Hostname = hostname_fqdn();
</Output>

<Route route-0>
  Path 59393d5bfb979b6549b31c0f => 59393d4ffb979b6549b31c00
</Route>

The Graylog Server itself doesn't have any special configuration. Just the normal collector setup.

mariussturm commented 7 years ago

Is this still a problem with 0.1.4?

mariussturm commented 7 years ago

Closing this for now, feel free to re-open if the error still exist in >=0.1.4