DataDog / datadog-agent

Main repository for Datadog Agent
https://docs.datadoghq.com/
Apache License 2.0
2.84k stars 1.2k forks source link

(Windows 7) dd-agent fails to start with panic: runtime error: index out of range #2176

Open nicbarker opened 6 years ago

nicbarker commented 6 years ago

Describe what happened: Attempting to start the agent on Windows 7 service pack 1 using the command: "C:\Program Files\Datadog\Datadog Agent\embedded\agent.exe" run The agent crashes almost immediately after starting.

Describe what you expected: The agent to start normally.

Steps to reproduce the issue: Looks to be something about this specific windows machine, I've got it running on another Windows 7 SP1 machine with similar hardware. I've tried purging everything and reinstalling, re-downloading the installer, with no luck.

Additional environment details (Operating System, Cloud provider, etc): Output from dd-agent on stdout:

2018-08-17 11:04:22 AEST | INFO | (start.go:116 in StartAgent) | Starting Datadog Agent v6.4.2
2018-08-17 11:04:22 AEST | DEBUG | (hostname.go:108 in GetHostname) | Unable to get the hostname from the config file: host name is empty
2018-08-17 11:04:22 AEST | DEBUG | (hostname.go:109 in GetHostname) | Trying to determine a reliable host name automatically...
2018-08-17 11:04:22 AEST | DEBUG | (hostname.go:118 in GetHostname) | GetHostname trying GCE metadata...
2018-08-17 11:04:23 AEST | DEBUG | (hostname.go:125 in GetHostname) | Unable to get hostname from GCE:  unable to retrieve hostname from GCE: Get http://169.254.169.254/computeMetadata/v1/instance/hostname: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2018-08-17 11:04:23 AEST | DEBUG | (hostname.go:129 in GetHostname) | GetHostname trying FQDN/`hostname -f`...
2018-08-17 11:04:23 AEST | DEBUG | (hostname.go:134 in GetHostname) | Unable to get FQDN from system:  getSystemFQDN is not implemented on windows
2018-08-17 11:04:23 AEST | DEBUG | (hostname.go:144 in GetHostname) | GetHostname trying os...
2018-08-17 11:04:23 AEST | DEBUG | (hostname.go:158 in GetHostname) | GetHostname trying EC2 metadata...
2018-08-17 11:04:23 AEST | DEBUG | (hostname.go:165 in GetHostname) | EC2 instance ID is not a valid hostname:  host name is empty
2018-08-17 11:04:23 AEST | INFO | (start.go:134 in StartAgent) | Hostname is: Helios-084
2018-08-17 11:04:23 AEST | DEBUG | (common_windows.go:102 in GetViewsPath) | ViewsPath is now %s C:\Program Files\Datadog\Datadog Agent\bin\agent\dist\views
2018-08-17 11:04:23 AEST | INFO | (gui.go:81 in StartGUIServer) | GUI server is listening at 127.0.0.1:5002
2018-08-17 11:04:23 AEST | DEBUG | (start.go:164 in StartAgent) | Starting forwarder
2018-08-17 11:04:23 AEST | INFO | (forwarder.go:153 in Start) | Forwarder started, sending to 1 endpoint(s) with 1 worker(s) each: "https://6-4-2-app.agent.datadoghq.com" (1 api key(s))
2018-08-17 11:04:23 AEST | DEBUG | (start.go:166 in StartAgent) | Forwarder started
2018-08-17 11:04:23 AEST | DEBUG | (forwarder_health.go:83 in healthCheckLoop) | Waiting for APIkey validity to be confirmed.
2018-08-17 11:04:23 AEST | DEBUG | (udp.go:68 in NewUDPListener) | dogstatsd-udp: 127.0.0.1:8125 successfully initialized
2018-08-17 11:04:23 AEST | INFO | (udp.go:74 in Listen) | dogstatsd-udp: starting to listen on 127.0.0.1:8125
2018-08-17 11:04:23 AEST | DEBUG | (start.go:181 in StartAgent) | statsd started
2018-08-17 11:04:23 AEST | INFO | (start.go:193 in StartAgent) | logs-agent disabled
2018-08-17 11:04:23 AEST | INFO | (tagger.go:79 in Init) | starting the tagging system
2018-08-17 11:04:23 AEST | DEBUG | (tagger.go:127 in startCollectors) | candidate list empty, stopping detection
2018-08-17 11:04:23 AEST | INFO | (runner.go:92 in NewRunner) | Runner started with 1 workers.
2018-08-17 11:04:23 AEST | DEBUG | (runner.go:226 in work) | Ready to process checks...
2018-08-17 11:04:23 AEST | DEBUG | (scheduler.go:129 in func1) | Starting scheduler loop...
2018-08-17 11:04:23 AEST | INFO | (collector.go:52 in NewCollector) | Embedding Python 2.7.15 (v2.7.15:ca079a3ea3, Apr 30 2018, 16:30:26) [MSC v.1500 64 bit (AMD64)]
2018-08-17 11:04:23 AEST | DEBUG | (collector.go:53 in NewCollector) | Python Home: C:\Program Files\Datadog\Datadog Agent\embedded
2018-08-17 11:04:23 AEST | DEBUG | (collector.go:54 in NewCollector) | Python path: ['C:\\Program Files\\Datadog\\Datadog Agent\\embedded\\python27.zip', 'C:\\Program Files\\Datadog\\Datadog Agent\\embedded\\DLLs', 'C:\\Program Files\\Datadog\\Datadog Agent\\embedded\\lib', 'C:\\Program Files\\Datadog\\Datadog Agent\\embedded\\lib\\plat-win', 'C:\\Program Files\\Datadog\\Datadog Agent\\embedded\\lib\\lib-tk', 'C:\\Program Files\\Datadog\\Datadog Agent\\embedded', 'C:\\Program Files\\Datadog\\Datadog Agent\\embedded\\lib\\site-packages', 'C:\\Program Files\\Datadog\\Datadog Agent\\embedded\\lib\\site-packages\\setuptools-28.8.0.post20180809-py2.7.egg', 'C:\\Program Files\\Datadog\\Datadog Agent\\embedded\\lib\\site-packages\\win32', 'C:\\Program Files\\Datadog\\Datadog Agent\\embedded\\lib\\site-packages\\win32\\lib', 'C:\\Program Files\\Datadog\\Datadog Agent\\embedded\\lib\\site-packages\\Pythonwin', 'C:\\Program Files\\Datadog\\Datadog Agent\\bin\\agent\\dist', 'C:\\Program Files\\Datadog\\Datadog Agent\\checks.d', 'C:\\Program Files\\Datadog\\Datadog Agent\\bin\\agent\\dist\\checks.d', 'c:\\programdata\\datadog\\checks.d']
2018-08-17 11:04:23 AEST | DEBUG | (collector.go:63 in NewCollector) | Collector up and running!
2018-08-17 11:04:23 AEST | DEBUG | (scheduler.go:54 in InitCheckScheduler) | Added Python Check Loader to Check Scheduler
2018-08-17 11:04:23 AEST | DEBUG | (scheduler.go:54 in InitCheckScheduler) | Added Core Check Loader to Check Scheduler
2018-08-17 11:04:23 AEST | DEBUG | (scheduler.go:54 in InitCheckScheduler) | Added JMX Check Loader to Check Scheduler
2018-08-17 11:04:23 AEST | INFO | (file.go:69 in Collect) | File Configuration Provider: searching for configuration files at: c:\programdata\datadog\conf.d
2018-08-17 11:04:23 AEST | DEBUG | (file.go:191 in collectEntry) | Found valid configuration in file: c:\programdata\datadog\conf.d\activemq.d\metrics.yaml
2018-08-17 11:04:23 AEST | DEBUG | (file.go:191 in collectEntry) | Found valid configuration in file: c:\programdata\datadog\conf.d\cassandra.d\metrics.yaml
2018-08-17 11:04:23 AEST | DEBUG | (file.go:191 in collectEntry) | Found valid configuration in file: c:\programdata\datadog\conf.d\cpu.d\conf.yaml.default
2018-08-17 11:04:23 AEST | DEBUG | (file.go:191 in collectEntry) | Found valid configuration in file: c:\programdata\datadog\conf.d\disk.d\conf.yaml.default
2018-08-17 11:04:23 AEST | DEBUG | (file.go:191 in collectEntry) | Found valid configuration in file: c:\programdata\datadog\conf.d\file_handle.d\conf.yaml.default
2018-08-17 11:04:23 AEST | DEBUG | (file.go:191 in collectEntry) | Found valid configuration in file: c:\programdata\datadog\conf.d\io.d\conf.yaml.default
2018-08-17 11:04:23 AEST | DEBUG | (file.go:191 in collectEntry) | Found valid configuration in file: c:\programdata\datadog\conf.d\kafka.d\metrics.yaml
2018-08-17 11:04:23 AEST | DEBUG | (file.go:191 in collectEntry) | Found valid configuration in file: c:\programdata\datadog\conf.d\memory.d\conf.yaml.default
2018-08-17 11:04:23 AEST | DEBUG | (file.go:191 in collectEntry) | Found valid configuration in file: c:\programdata\datadog\conf.d\network.d\conf.yaml.default
2018-08-17 11:04:23 AEST | DEBUG | (file.go:191 in collectEntry) | Found valid configuration in file: c:\programdata\datadog\conf.d\ntp.d\conf.yaml.default
2018-08-17 11:04:23 AEST | DEBUG | (file.go:191 in collectEntry) | Found valid configuration in file: c:\programdata\datadog\conf.d\solr.d\metrics.yaml
2018-08-17 11:04:23 AEST | DEBUG | (file.go:191 in collectEntry) | Found valid configuration in file: c:\programdata\datadog\conf.d\tomcat.d\metrics.yaml
2018-08-17 11:04:23 AEST | DEBUG | (file.go:191 in collectEntry) | Found valid configuration in file: c:\programdata\datadog\conf.d\uptime.d\conf.yaml.default
2018-08-17 11:04:23 AEST | DEBUG | (file.go:191 in collectEntry) | Found valid configuration in file: c:\programdata\datadog\conf.d\winproc.d\conf.yaml.default
2018-08-17 11:04:23 AEST | INFO | (file.go:69 in Collect) | File Configuration Provider: searching for configuration files at: C:\Program Files\Datadog\Datadog Agent\bin\agent\dist\conf.d
2018-08-17 11:04:23 AEST | WARN | (file.go:73 in Collect) | Skipping, open C:\Program Files\Datadog\Datadog Agent\bin\agent\dist\conf.d: The system cannot find the file specified.
2018-08-17 11:04:23 AEST | DEBUG | (loader.go:89 in Load) | Unable to load python module - datadog_checks.cpu: No module named cpu
2018-08-17 11:04:23 AEST | DEBUG | (loader.go:89 in Load) | Unable to load python module - cpu: No module named cpu
2018-08-17 11:04:23 AEST | DEBUG | (scheduler.go:146 in getChecks) | Python Check Loader: unable to load the check 'cpu': No module named cpu
2018-08-17 11:04:23 AEST | DEBUG | (scheduler.go:137 in getChecks) | Core Check Loader: successfully loaded check 'cpu'
2018-08-17 11:04:23 AEST | WARN | (check.go:250 in Configure) | could not get a check instance with the new api: __init__() takes at least 4 arguments (4 given)
2018-08-17 11:04:23 AEST | WARN | (check.go:251 in Configure) | trying to instantiate the check with the old api, passing agentConfig to the constructor
2018-08-17 11:04:23 AEST | WARN | (check.go:276 in Configure) | passing `agentConfig` to the constructor is deprecated, please use the `get_config` function from the 'datadog_agent' package (disk).
2018-08-17 11:04:23 AEST | DEBUG | (check.go:278 in Configure) | python check configure done disk
2018-08-17 11:04:23 AEST | DEBUG | (loader.go:169 in Load) | python loader: done loading check disk (version 1.2.0)
2018-08-17 11:04:23 AEST | DEBUG | (scheduler.go:137 in getChecks) | Python Check Loader: successfully loaded check 'disk'
2018-08-17 11:04:23 AEST | DEBUG | (loader.go:89 in Load) | Unable to load python module - datadog_checks.file_handle: No module named file_handle
2018-08-17 11:04:23 AEST | DEBUG | (loader.go:89 in Load) | Unable to load python module - file_handle: No module named file_handle
2018-08-17 11:04:23 AEST | DEBUG | (scheduler.go:146 in getChecks) | Python Check Loader: unable to load the check 'file_handle': No module named file_handle
2018-08-17 11:04:24 AEST | WARN | (start.go:273 in StopAgent) | Some components were unhealthy: [ad-configresolver tagger forwarder aggregator dogstatsd-main ad-autoconfig]
2018-08-17 11:04:24 AEST | DEBUG | (scheduler.go:171 in Stop) | Waiting for the scheduler to shutdown
2018-08-17 11:04:24 AEST | DEBUG | (scheduler.go:144 in func1) | Exited Scheduler loop, shutting down queues...
2018-08-17 11:04:24 AEST | DEBUG | (scheduler.go:185 in stopQueues) | Stopping 0 queue(s)
2018-08-17 11:04:24 AEST | INFO | (runner.go:149 in Stop) | Runner is shutting down...
2018-08-17 11:04:24 AEST | DEBUG | (runner.go:320 in work) | Finished processing checks.
2018-08-17 11:04:24 AEST | INFO | (domain_forwarder.go:185 in Stop) | domainForwarder stopped
2018-08-17 11:04:24 AEST | INFO | (start.go:294 in StopAgent) | See ya!

Output from agent on stderr:

panic: runtime error: index out of range

goroutine 1 [running]:
github.com/DataDog/datadog-agent/pkg/util/winutil/pdhutil.makeCounterSetIndexes(0x30, 0x118e640)
    C:/gitlab-ci/builds/6fa684a8/0/DataDog/datadog-agent/.omnibus/src/datadog-agent/src/github.com/DataDog/datadog-agent/pkg/util/winutil/pdhutil/pdhcounter.go:69 +0x3cf
github.com/DataDog/datadog-agent/pkg/util/winutil/pdhutil.getCounterIndexList(0x1204398, 0x7, 0x0, 0x0, 0x0, 0x3, 0xc04203cf70)
    C:/gitlab-ci/builds/6fa684a8/0/DataDog/datadog-agent/.omnibus/src/datadog-agent/src/github.com/DataDog/datadog-agent/pkg/util/winutil/pdhutil/pdhcounter.go:243 +0x41
github.com/DataDog/datadog-agent/pkg/util/winutil/pdhutil.GetCounterSet(0x1204398, 0x7, 0x12094dd, 0xc, 0x1203189, 0x6, 0x0, 0x0, 0x0, 0x0)
    C:/gitlab-ci/builds/6fa684a8/0/DataDog/datadog-agent/.omnibus/src/datadog-agent/src/github.com/DataDog/datadog-agent/pkg/util/winutil/pdhutil/pdhcounter.go:82 +0x9b
github.com/DataDog/datadog-agent/pkg/collector/corechecks/system.(*fhCheck).Configure(0xc0424fe680, 0xc0423a2f28, 0x3, 0x8, 0x0, 0x0, 0x0, 0x58, 0xc042531980)
    C:/gitlab-ci/builds/6fa684a8/0/DataDog/datadog-agent/.omnibus/src/datadog-agent/src/github.com/DataDog/datadog-agent/pkg/collector/corechecks/system/file_handles_windows.go:47 +0x74
github.com/DataDog/datadog-agent/pkg/collector/corechecks.(*GoCheckLoader).Load(0x1a6e330, 0xc042028870, 0xb, 0xc0424abe80, 0x1, 0x1, 0x0, 0x0, 0x0, 0x0, ...)
    C:/gitlab-ci/builds/6fa684a8/0/DataDog/datadog-agent/.omnibus/src/datadog-agent/src/github.com/DataDog/datadog-agent/pkg/collector/corechecks/loader.go:74 +0x237
github.com/DataDog/datadog-agent/pkg/collector.(*CheckScheduler).getChecks(0xc042150a40, 0xc042028870, 0xb, 0xc0424abe80, 0x1, 0x1, 0x0, 0x0, 0x0, 0x0, ...)
    C:/gitlab-ci/builds/6fa684a8/0/DataDog/datadog-agent/.omnibus/src/datadog-agent/src/github.com/DataDog/datadog-agent/pkg/collector/scheduler.go:135 +0x10c
github.com/DataDog/datadog-agent/pkg/collector.(*CheckScheduler).GetChecksFromConfigs(0xc042150a40, 0xc042449500, 0x9, 0x11, 0x1, 0x0, 0x0, 0x0)
    C:/gitlab-ci/builds/6fa684a8/0/DataDog/datadog-agent/.omnibus/src/datadog-agent/src/github.com/DataDog/datadog-agent/pkg/collector/scheduler.go:182 +0x226
github.com/DataDog/datadog-agent/pkg/collector.(*CheckScheduler).Schedule(0xc042150a40, 0xc042449500, 0x9, 0x11)
    C:/gitlab-ci/builds/6fa684a8/0/DataDog/datadog-agent/.omnibus/src/datadog-agent/src/github.com/DataDog/datadog-agent/pkg/collector/scheduler.go:61 +0x73
github.com/DataDog/datadog-agent/pkg/autodiscovery/scheduler.(*MetaScheduler).Schedule(0xc0421540f8, 0xc042449500, 0x9, 0x11)
    C:/gitlab-ci/builds/6fa684a8/0/DataDog/datadog-agent/.omnibus/src/datadog-agent/src/github.com/DataDog/datadog-agent/pkg/autodiscovery/scheduler/meta.go:36 +0xbe
github.com/DataDog/datadog-agent/pkg/autodiscovery.(*AutoConfig).schedule(0xc0421c6240, 0xc042449500, 0x9, 0x11)
    C:/gitlab-ci/builds/6fa684a8/0/DataDog/datadog-agent/.omnibus/src/datadog-agent/src/github.com/DataDog/datadog-agent/pkg/autodiscovery/autoconfig.go:216 +0x59
github.com/DataDog/datadog-agent/pkg/autodiscovery.(*AutoConfig).LoadAndRun(0xc0421c6240)
    C:/gitlab-ci/builds/6fa684a8/0/DataDog/datadog-agent/.omnibus/src/datadog-agent/src/github.com/DataDog/datadog-agent/pkg/autodiscovery/autoconfig.go:161 +0x40
github.com/DataDog/datadog-agent/cmd/agent/common.StartAutoConfig()
    C:/gitlab-ci/builds/6fa684a8/0/DataDog/datadog-agent/.omnibus/src/datadog-agent/src/github.com/DataDog/datadog-agent/cmd/agent/common/autoconfig.go:103 +0x44
github.com/DataDog/datadog-agent/cmd/agent/app.StartAgent(0x10, 0x124b028)
    C:/gitlab-ci/builds/6fa684a8/0/DataDog/datadog-agent/.omnibus/src/datadog-agent/src/github.com/DataDog/datadog-agent/cmd/agent/app/start.go:199 +0xbfb
github.com/DataDog/datadog-agent/cmd/agent/app.run(0x1a35d20, 0x1a6e330, 0x0, 0x0, 0x0, 0x0)
    C:/gitlab-ci/builds/6fa684a8/0/DataDog/datadog-agent/.omnibus/src/datadog-agent/src/github.com/DataDog/datadog-agent/cmd/agent/app/run.go:84 +0x122
github.com/DataDog/datadog-agent/vendor/github.com/spf13/cobra.(*Command).execute(0x1a35d20, 0x1a6e330, 0x0, 0x0, 0x1a35d20, 0x1a6e330)
    C:/gitlab-ci/builds/6fa684a8/0/DataDog/datadog-agent/.omnibus/src/datadog-agent/src/github.com/DataDog/datadog-agent/vendor/github.com/spf13/cobra/command.go:762 +0x46f
github.com/DataDog/datadog-agent/vendor/github.com/spf13/cobra.(*Command).ExecuteC(0x1a31aa0, 0xc0421487e0, 0x0, 0xc04223df18)
    C:/gitlab-ci/builds/6fa684a8/0/DataDog/datadog-agent/.omnibus/src/datadog-agent/src/github.com/DataDog/datadog-agent/vendor/github.com/spf13/cobra/command.go:852 +0x311
github.com/DataDog/datadog-agent/vendor/github.com/spf13/cobra.(*Command).Execute(0x1a31aa0, 0x124b640, 0xc04223df78)
    C:/gitlab-ci/builds/6fa684a8/0/DataDog/datadog-agent/.omnibus/src/datadog-agent/src/github.com/DataDog/datadog-agent/vendor/github.com/spf13/cobra/command.go:800 +0x32
main.main()
    C:/gitlab-ci/builds/6fa684a8/0/DataDog/datadog-agent/.omnibus/src/datadog-agent/src/github.com/DataDog/datadog-agent/cmd/agent/main_windows.go:48 +0x6a
derekwbrown commented 6 years ago

While it definitely shouldn't panic as a result, it looks like the Windows performance counter database is out of whack. Can you confirm this is a us-english version of windows (or if not, what the locale is)?

nicbarker commented 6 years ago

Hi @derekwbrown, thanks for the prompt reply! The locale on the machine seems to be set to English (Canada).

derekwbrown commented 6 years ago

for the ones that are working, are they en-ca? Or something else?

nicbarker commented 6 years ago

Yep, it appears that they're English (Canada) as well.

derekwbrown commented 6 years ago

On the machine in question, it looks like the performance counter data configuration is awry... This link contains instructions on how to repair... https://support.microsoft.com/en-us/help/300956/how-to-manually-rebuild-performance-counter-library-values

nicbarker commented 6 years ago

Thanks for the instructions, I'll test this out on Monday and get back to you with results.