DataDog / datadog-agent

Main repository for Datadog Agent
https://docs.datadoghq.com/
Apache License 2.0
2.83k stars 1.19k forks source link

kubernetes logging, datadog-agent RunContainerError #3370

Open vasiliyb opened 5 years ago

vasiliyb commented 5 years ago

Output of the info page (if this is a bug)

(Paste the output of the info page here)

unable to provide the info, pods will not start

Describe what happened: Having problem with collecting logs from GKE containers. Followed https://app.datadoghq.com/logs/onboarding/container steps after which am seeing the following in the datadog-agent logs which are crashlooping:

NAME                  READY   STATUS              RESTARTS   AGE
datadog-agent-4kgjf   0/1     RunContainerError   0          3s
datadog-agent-d5wbs   0/1     RunContainerError   0          3s
datadog-agent-khh4b   0/1     RunContainerError   1          3s
vasiliy@Vasiliys-Pro:~/Code/js/CM/empire/datadog/deployment% kubectl logs -f datadog-agent-khh4b
failed to open log file "/var/log/pods/default_datadog-agent-khh4b_2643e896-66e4-11e9-a6de-42010a8e0020/datadog-agent/1.log": open /var/log/pods/default_datadog-agent-khh4b_2643e896-66e4-11e9-a6de-42010a8e0020/datadog-agent/1.log: no such file or directory%

Describe what you expected: datadog-agent pods starting fine

Steps to reproduce the issue:

Additional environment details (Operating System, Cloud provider, etc):

Latest GKE DaemonSet:

apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: datadog-agent
spec:
  template:
    metadata:
      labels:
        app: datadog-agent
      name: datadog-agent
    spec:
      serviceAccountName: datadog-agent
      containers:
      - image: datadog/agent:latest
        imagePullPolicy: Always
        name: datadog-agent
        ports:
          - containerPort: 8125
            # Custom metrics via DogStatsD - uncomment this section to enable custom metrics collection
            hostPort: 38125
            name: dogstatsdport
            protocol: UDP
          - containerPort: 8126
            # Trace Collection (APM) - uncomment this section to enable APM
            hostPort: 38126
            name: traceport
            protocol: TCP
        env:
          - name: DD_API_KEY
            value: "xxxxxxxxxx"
          - name: DD_COLLECT_KUBERNETES_EVENTS
            value: "true"
          - name: DD_LEADER_ELECTION
            value: "true"
          - name: KUBERNETES
            value: "yes"
          - name: DD_KUBERNETES_KUBELET_HOST
            valueFrom:
              fieldRef:
                fieldPath: status.hostIP
          - name: DD_APM_ENABLED
            value: "true"
          - name: DD_DOGSTATSD_NON_LOCAL_TRAFFIC
            value: "true"
          - name: DD_LOGS_ENABLED
            value: "true"
          - name: DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL
            value: "true"
          - name: DD_AC_EXCLUDE
            value: "name:datadog-agent"
        resources:
          requests:
            memory: "256Mi"
            cpu: "100m"
          limits:
            memory: "256Mi"
            cpu: "200m"
        volumeMounts:
          - name: dockersocket
            mountPath: /var/run/docker.sock
          - name: procdir
            mountPath: /host/proc
            readOnly: true
          - name: cgroups
            mountPath: /host/sys/fs/cgroup
            readOnly: true
          - name: pointerdir
            mountPath: /opt/datadog-agent/run
        livenessProbe:
          exec:
            command:
            - ./probe.sh
          initialDelaySeconds: 15
          periodSeconds: 5
      volumes:
        - hostPath:
            path: /var/run/docker.sock
          name: dockersocket
        - hostPath:
            path: /proc
          name: procdir
        - hostPath:
            path: /sys/fs/cgroup
          name: cgroups
        - hostPath:
            path: /opt/datadog-agent/run
          name: pointerdir
vasiliyb commented 5 years ago

Support has been unresponsive for days... 👎

CharlyF commented 5 years ago

@vasiliyb I think this is tied to this issue: https://github.com/rancher/charts/issues/24#issuecomment-415692699 Could you try using a different mountdir than /opt/datadog-agent/run as specified in the issue?

cc @DylanLovesCoffee

DylanLovesCoffee commented 5 years ago

@vasiliyb Could you create a volume and volumeMount to /var/log/pods in your DD Daemonset and update us in the ticket of the result?

jtrh commented 5 years ago

I fixed the failed to open log file "/var/log/pods/ (...) error with the following configuration:

# In spec.template.spec.containers[0].volumeMounts.
- name: logpath
  mountPath: /var/log/pods
- name: dockercontainers
  mountPath: /var/lib/docker/containers
  readOnly: true
# In spec.template.spec.volumes.
- hostPath:
    path: /var/log/pods
  name: logpath
- hostPath:
    path: /var/lib/docker/containers
  name: dockercontainers

The problem is that the files in /var/log/pods/*/*/*.log are actually symbolic links that point to /var/lib/docker/containers/*/*.log, so mounting only /var/log/pods is not enough.

I fixed the read-only file system problem that occurs when attempting to mount the host path /opt/datadog-agent/run by replacing /opt/datadog-agent/run with /var/lib/datadog-agent/run in spec.template.spec.volumes[].hostPath.path (ref: https://github.com/DataDog/datadog-agent/issues/3370#issuecomment-487438501, https://github.com/rancher/charts/issues/24#issuecomment-415692699).

Olofguard commented 5 years ago

@jtrh thanks! worked perfectly.

ajacquemot commented 5 years ago

Hi @vasiliyb

Thanks for submitting your issue and providing a solution.

We updated recently the datadog deamonset to include /var/lib/docker/containers by default as well as updated the documentation, see:

Feel free to close the issue, if you think your problem is solved.

Cheers

daparthi001 commented 4 years ago

How do i get logs from different file path which are written /opt//log ?