DataDog / dd-agent

Datadog Agent Version 5
https://docs.datadoghq.com/
Other
1.3k stars 813 forks source link

Errors spawned by agent.py on Alpine linux #3665

Open dirkmoors opened 6 years ago

dirkmoors commented 6 years ago
, data, cpu_m, filter_value=110)
  File "/opt/datadog-agent/agent/checks/system/unix.py", line 627, in get_value
    value = to_float(data[legend.index(name)])
  File "/opt/datadog-agent/agent/checks/system/unix.py", line 30, in <lambda>
    to_float = lambda s: float(s.replace(",", "."))
ValueError: invalid literal for float(): 1.00
2018-02-08 20:25:28,542 | ERROR | dd.collector | checks.collector(unix.py:220) | Cannot extract IO statistics
Traceback (most recent call last):
  File "/opt/datadog-agent/agent/checks/system/unix.py", line 139, in check
    io.update(self._parse_linux2(stdout))
  File "/opt/datadog-agent/agent/checks/system/unix.py", line 42, in _parse_linux2
    recentStats = output.split('Device:')[2].split('\n')
IndexError: list index out of range
2018-02-08 20:25:28,592 | INFO | dd.collector | checks.collector(collector.py:830) | gohai file not found
2018-02-08 20:25:28,835 | INFO | dd.collector | checks.collector(collector.py:543) | Finished run #4. Collection time: 4.21s. Emit time: 0.13s
2018-02-08 20:25:46,877 | ERROR | dd.collector | checks.collector(unix.py:781) | Cannot compute CPU stats
Traceback (most recent call last):
  File "/opt/datadog-agent/agent/checks/system/unix.py", line 676, in check
    cpu_metrics[cpu_m] = get_value(headers, data, cpu_m, filter_value=110)
  File "/opt/datadog-agent/agent/checks/system/unix.py", line 627, in get_value
    value = to_float(data[legend.index(name)])
  File "/opt/datadog-agent/agent/checks/system/unix.py", line 30, in <lambda>
    to_float = lambda s: float(s.replace(",", "."))
ValueError: invalid literal for float(): 1.05
2018-02-08 20:25:47,906 | ERROR | dd.collector | checks.collector(unix.py:220) | Cannot extract IO statistics
Traceback (most recent call last):
  File "/opt/datadog-agent/agent/checks/system/unix.py", line 139, in check
    io.update(self._parse_linux2(stdout))
  File "/opt/datadog-agent/agent/checks/system/unix.py", line 42, in _parse_linux2
    recentStats = output.split('Device:')[2].split('\n')
IndexError: list index out of range
2018-02-08 20:25:48,144 | INFO | dd.collector | checks.collector(collector.py:543) | Finished run #5. Collection time: 4.17s. Emit time: 0.11s
2018-02-08 20:25:48,150 | INFO | dd.collector | checks.collector(collector.py:546) | First flushes done, next flushes will be logged every 10 flushes.
2018-02-08 20:26:06,186 | ERROR | dd.collector | checks.collector(unix.py:781) | Cannot compute CPU stats
Traceback (most recent call last):
  File "/opt/datadog-agent/agent/checks/system/unix.py", line 676, in check
    cpu_metrics[cpu_m] = get_value(headers, data, cpu_m, filter_value=110)
  File "/opt/datadog-agent/agent/checks/system/unix.py", line 627, in get_value
    value = to_float(data[legend.index(name)])
  File "/opt/datadog-agent/agent/checks/system/unix.py", line 30, in <lambda>
    to_float = lambda s: float(s.replace(",", "."))
ValueError: invalid literal for float(): 0.51
2018-02-08 20:26:07,218 | ERROR | dd.collector | checks.collector(unix.py:220) | Cannot extract IO statistics
Traceback (most recent call last):
  File "/opt/datadog-agent/agent/checks/system/unix.py", line 139, in check
    io.update(self._parse_linux2(stdout))
  File "/opt/datadog-agent/agent/checks/system/unix.py", line 42, in _parse_linux2
    recentStats = output.split('Device:')[2].split('\n')
IndexError: list index out of range
2018-02-08 20:26:25,486 | ERROR | dd.collector | checks.collector(unix.py:781) | Cannot compute CPU stats
Traceback (most recent call last):
  File "/opt/datadog-agent/agent/checks/system/unix.py", line 676, in check
    cpu_metrics[cpu_m] = get_value(headers, data, cpu_m, filter_value=110)
  File "/opt/datadog-agent/agent/checks/system/unix.py", line 627, in get_value
    value = to_float(data[legend.index(name)])
  File "/opt/datadog-agent/agent/checks/system/unix.py", line 30, in <lambda>
    to_float = lambda s: float(s.replace(",", "."))
ValueError: invalid literal for float(): 0.76
2018-02-08 20:26:26,516 | ERROR | dd.collector | checks.collector(unix.py:220) | Cannot extract IO statistics
Traceback (most recent call last):
  File "/opt/datadog-agent/agent/checks/system/unix.py", line 139, in check
    io.update(self._parse_linux2(stdout))
  File "/opt/datadog-agent/agent/checks/system/unix.py", line 42, in _parse_linux2
    recentStats = output.split('Device:')[2].split('\n')
IndexError: list index out of range
2018-02-08 20:26:44,778 | ERROR | dd.collector | checks.collector(unix.py:781) | Cannot compute CPU stats
Traceback (most recent call last):
  File "/opt/datadog-agent/agent/checks/system/unix.py", line 676, in check
    cpu_metrics[cpu_m] = get_value(headers, data, cpu_m, filter_value=110)
  File "/opt/datadog-agent/agent/checks/system/unix.py", line 627, in get_value
    value = to_float(data[legend.index(name)])
  File "/opt/datadog-agent/agent/checks/system/unix.py", line 30, in <lambda>
    to_float = lambda s: float(s.replace(",", "."))
ValueError: invalid literal for float(): 0.25
2018-02-08 20:26:45,808 | ERROR | dd.collector | checks.collector(unix.py:220) | Cannot extract IO statistics
Traceback (most recent call last):
  File "/opt/datadog-agent/agent/checks/system/unix.py", line 139, in check
    io.update(self._parse_linux2(stdout))
  File "/opt/datadog-agent/agent/checks/system/unix.py", line 42, in _parse_linux2
    recentStats = output.split('Device:')[2].split('\n')
IndexError: list index out of range
2018-02-08 20:26:45,859 | INFO | dd.collector | checks.collector(collector.py:830) | gohai file not found
dirkmoors commented 6 years ago

My guess is that these checks have not been tested on Alpine Linux at all.

First of all, I would rewrite

to_float = lambda s: float(s.replace(",", "."))

to something like this:

def to_float(s):
    FLOAT = re.compile(r'[0-9]+\.[0-9]+')
    matches = FLOAT.findall(s)
    if not matches:
        return -1.0
    return float(matches[0])

This is needed because the current "get_subprocess_output" will append strange byte-chars to the values, like '�'.

Secondly, I would update the "IO._parse_linux2" function to something like this:

class IO(Check):
    ...
    def _parse_linux2(self, output):        
        TABLE_RE = re.compile(r'([a-zA-Z0-9([a-zA-Z0-9\%\/\._-]+)')

        iostats = {}
        header = []
        lines = output.split('\n')
        for i, line in enumerate(lines):
            if line.startswith('Device'):
                header = TABLE_RE.findall(lines[i])
                assert header[0] == 'Device'
            elif line.strip() == '':
                header = []
                continue

            values = TABLE_RE.findall(lines[i + 1])
            if values and len(values) == len(header):
                device = values[0]
                iostats[device] = {}
                for i, key in enumerate(header[1:]):
                    iostats[device][key] = float(values[i + 1])
        return iostats
    ...

This will parse the table more robustly imho.

This approach is still not providing all required statistics, but it might help in the right direction.

Bottom line: please add better support for Alpine Linux :)

farzadanooshah commented 6 years ago

any update on the reported issue?