HDFGroup / hsds

Cloud-native, service based access to HDF data
https://www.hdfgroup.org/solutions/hdf-kita/
Apache License 2.0
128 stars 52 forks source link

Reduce level of "no app label" message and "/info" request in the logs #117

Closed bilalshaikh42 closed 2 years ago

bilalshaikh42 commented 2 years ago

When reading through the logs (set to Warn) we having trouble finding meaningful messages due to the prevalence of

WARN> _k8sGetPodIPs - no app label

on the head node and

REQ> GET: /info [10.20.3.17:6101]
WARN> _k8sGetPodIPs - no app label

on the data nodes.

Would it be possible to reduce the level of these logs to debug so that we can more easily read through the logs?

jreadey commented 2 years ago

@bilalshaikh42 - seems like you are using the configuration with a head container, service node, and data node containers in each pod - correct?

If that's the case, the getPodIps logic shouldn't be invoked. See: https://github.com/HDFGroup/hsds/blob/master/hsds/basenode.py#L271. The idea here is that each HSDS pod is independent, so there's no need for pod to pod connections. If you see the warns from invoking the k8s_api, seems like there is a miss-configuration somewhere.

bilalshaikh42 commented 2 years ago

Here is our deployment. I believe this was basically copied/pasted from one of the examples given, but this may have been before the full change to sn/dn not needing communication.

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: hsds
  name: hsds
spec:
  replicas: 1
  selector:
    matchLabels:
      app: hsds
  template:
    metadata:
      labels:
        app: hsds
    spec:
      containers:
        - name: sn
          image: hdfgroup/hsds:v0.7beta8
          imagePullPolicy: Always
          ports:
            - containerPort: 5101
          env:
            - name: NODE_TYPE
              value: sn
            - name: HEAD_PORT
              value: null # no head container
            - name: AWS_ACCESS_KEY_ID
              valueFrom:
                secretKeyRef:
                  name: aws-auth-keys
                  key: aws_access_key_id
            - name: AWS_SECRET_ACCESS_KEY
              valueFrom:
                secretKeyRef:
                  name: aws-auth-keys
                  key: aws_secret_access_key
            - name: AWS_REGION
              value: us-east-1
            - name: BUCKET_NAME
              value: biosimdev
            - name: LOG_LEVEL
              value: WARN
          volumeMounts:
            ....
        - name: dn
          image: hdfgroup/hsds:v0.7beta8
          lifecycle:
            preStop:
              exec:
                command:
                  [
                    "/usr/bin/curl",
                    "-X",
                    "DELETE",
                    "http://127.0.0.1:6101/prestop",
                  ]
          imagePullPolicy: Always
          ports:
            - containerPort: 6101
          env:
            - name: NODE_TYPE
              value: dn
            - name: HEAD_PORT
              value: null # no head container
            - name: AWS_ACCESS_KEY_ID
              valueFrom:
                secretKeyRef:
                  name: aws-auth-keys
                  key: aws_access_key_id
            - name: AWS_SECRET_ACCESS_KEY
              valueFrom:
                secretKeyRef:
                  name: aws-auth-keys
                  key: aws_secret_access_key

          volumeMounts:
            ...
      volumes:
...
bilalshaikh42 commented 2 years ago

One other change that may be relevant actually is that I have restricted the service account to a particular namespace since we are running isolated deployments:

kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: pods-list
rules:
  - apiGroups: [""]
    resources: ["pods"]
    verbs: ["list"]
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: pods-list
subjects:
  - kind: ServiceAccount
    name: default

roleRef:
  kind: Role
  name: pods-list
  apiGroup: rbac.authorization.k8s.io
jreadey commented 2 years ago

Your deployment yaml looks fine. I tried with the same image on my Kuberentes, but didn't see the warnings.

My guess is that you have some other pods running in the same namespace that don't have a "app" label. The k8sGotPodIPs code is seeing those and hence the warns in the log. Not having a app label seems totally legit, so I'll change the warning log to a debug log. I'll update this issue when a new image is available on docker hub.

bilalshaikh42 commented 2 years ago

My guess is that you have some other pods running in the same namespace that don't have a "app" label. The k8sGotPodIPs code is seeing those and hence the warns in the log. Not having a app label seems totally legit, so I'll change the warning log to a debug log. I'll update this issue when a new image is available on docker hub.

Yup, that is the case. Thanks!

jreadey commented 2 years ago

@bilalshaikh42 - are you still seeing these? If not I'll close the issue.

bilalshaikh42 commented 2 years ago

Nope, seems to be resolved!