DataDog / dd-agent

Datadog Agent Version 5
https://docs.datadoghq.com/
Other
1.3k stars 812 forks source link

getting spark checking error time to time #3766

Open Am1rr3zA opened 6 years ago

Am1rr3zA commented 6 years ago

I have installed dd-agent (version 6) in one of our EC2 instances and changed spark conf so I can monitor my EMR cluster

init_config:
instances:
  - spark_url: http://10.0.0.1:8088
    cluster_name: AnalyticDataDog-Test
    spark_cluster_mode: spark_yarn_mode
    tags:
      - instance:Test
      - cluster:Analytics
      - env:EMR

after I have restarted my dd-agent everything works fine and I started to get spark metrics but after a couple of minutes I start to get this:

ERROR | (runner.go:277 in work) | Error running check spark: 
[
  {
    "message": "Expecting value: line 1 column 1 (char 0)",
    "traceback": "Traceback (most recent call last):\n  File \"/opt/datadog-agent/embedded/lib/python2.7/site-packages/datadog_checks/checks/base.py\", line 294, in run\n    self.check(copy.deepcopy(self.instances[0]))\n  File \"/opt/datadog-agent/embedded/lib/python2.7/site-packages/datadog_checks/spark/spark.py\", line 153, in check\n    spark_apps = self._get_running_apps(instance, requests_config)\n  File \"/opt/datadog-agent/embedded/lib/python2.7/site-packages/datadog_checks/spark/spark.py\", line 260, in _get_running_apps\n    return self._get_spark_app_ids(running_apps, requests_config, tags)\n  File \"/opt/datadog-agent/embedded/lib/python2.7/site-packages/datadog_checks/spark/spark.py\", line 428, in _get_spark_app_ids\n    SPARK_SERVICE_CHECK, requests_config, tags)\n  File \"/opt/datadog-agent/embedded/lib/python2.7/site-packages/datadog_checks/spark/spark.py\", line 635, in _rest_request_to_json\n    response_json = response.json()\n  File \"/opt/datadog-agent/embedded/lib/python2.7/site-packages/requests/models.py\", line 896, in json\n    return complexjson.loads(self.text, **kwargs)\n  File \"/opt/datadog-agent/embedded/lib/python2.7/site-packages/simplejson/__init__.py\", line 505, in loads\n    return _default_decoder.decode(s)\n  File \"/opt/datadog-agent/embedded/lib/python2.7/site-packages/simplejson/decoder.py\", line 370, in decode\n    obj, end = self.raw_decode(s)\n  File \"/opt/datadog-agent/embedded/lib/python2.7/site-packages/simplejson/decoder.py\", line 400, in raw_decode\n    return self.scan_once(s, idx=_w(s, idx).end())\nJSONDecodeError: Expecting value: line 1 column 1 (char 0)\n"
  }
]

When I also try to check the DataDog service service datadog-agent status, I am getting:

● datadog-agent.service - "Datadog Agent"
   Loaded: loaded (/lib/systemd/system/datadog-agent.service; enabled)
   Active: active (running) since Wed 2018-07-18 17:53:21 UTC; 1h 37min ago
 Main PID: 18098 (agent)
   CGroup: /system.slice/datadog-agent.service
           └─18098 /opt/datadog-agent/bin/agent/agent start -p /opt/datadog-agent/run/agent.pid

Jul 18 19:29:08 ip-10-0-0-209 agent[18098]: 2018-07-18 19:29:08 UTC | INFO | (runner.go:309 in work) | Done running check memory
Jul 18 19:29:08 ip-10-0-0-209 agent[18098]: 2018-07-18 19:29:08 UTC | INFO | (runner.go:246 in work) | Running check network
Jul 18 19:29:08 ip-10-0-0-209 agent[18098]: 2018-07-18 19:29:08 UTC | INFO | (runner.go:309 in work) | Done running check network
Jul 18 19:29:08 ip-10-0-0-209 agent[18098]: 2018-07-18 19:29:08 UTC | INFO | (runner.go:246 in work) | Running check ntp
Jul 18 19:29:08 ip-10-0-0-209 agent[18098]: 2018-07-18 19:29:08 UTC | INFO | (runner.go:309 in work) | Done running check ntp
Jul 18 19:29:08 ip-10-0-0-209 agent[18098]: 2018-07-18 19:29:08 UTC | INFO | (runner.go:246 in work) | Running check uptime
Jul 18 19:29:08 ip-10-0-0-209 agent[18098]: 2018-07-18 19:29:08 UTC | INFO | (runner.go:309 in work) | Done running check uptime
Jul 18 19:29:37 ip-10-0-0-209 agent[18098]: 2018-07-18 19:29:37 UTC | INFO | (transaction.go:121 in Process) | Successfully posted payload to "https://6-3-3-app.agent.datadoghq.com/api/v1/series?api_key=*************************5c3b2"
Jul 18 19:29:59 ip-10-0-0-209 agent[18098]: 2018-07-18 19:29:59 UTC | ERROR | (runner.go:277 in work) | Error running check spark: [{"message": "Expecting value: line 1 column 1 (char 0)", "traceback": "Traceback (most recent call ...checks/base.py\",
Jul 18 19:30:10 ip-10-0-0-209 agent[18098]: 2018-07-18 19:30:10 UTC | ERROR | (runner.go:277 in work) | Error running check spark: [{"message": "500 Server Error: Connection refused (Connection refused) for url: http://ip-10-0-0-17...back": "Traceback
Hint: Some lines were ellipsized, use -l to show in full.
SiddChugh commented 1 year ago

Hey, I am getting the same error as well. I was wondering if you got around it?