Alluxio / alluxio

Alluxio, data orchestration for analytics and machine learning in the cloud
https://www.alluxio.io
Apache License 2.0
6.84k stars 2.94k forks source link

None of bytesRead* metrics goes up when Alluxio FUSE can successfully read data #13865

Open sg-c opened 3 years ago

sg-c commented 3 years ago

Alluxio Version: 2.6.0-RC2

Describe the bug None of following metrics go up when Alluxio FUSE can successfully read data. "Cluster.BytesReadDirectThroughput": { "value": 0 }, "Cluster.BytesReadDomainThroughput": { "value": 0 }, "Cluster.BytesReadLocalThroughput": { "value": 0 }, "Cluster.BytesReadRemoteThroughput": { "value": 0 }, "Cluster.BytesReadUfsThroughput": { "value": 161781413 # after all data are loaded into alluxio, this number doesn't go up anymore },

To Reproduce

  1. Deploy alluxio on K8s (yamls are shown below)
  2. Do distributedLoad with replications so that each worker has a replication of target file
  3. Read the Alluxio-cache data.
  4. Go to web ui http://localhost:30009/metrics/json/ to see metrics

Expected behavior At least ONE of bytesRead* metrics go up after read is done.

Urgency Urgent because without correct metrics, we are shooting in the dark.

Additional context


Source: alluxio/templates/config/alluxio-conf.yaml

#

The Alluxio Open Foundation licenses this work under the Apache License, version 2.0

(the "License"). You may not use this work except in compliance with the License, which is

available at www.apache.org/licenses/LICENSE-2.0

#

This software is distributed on an "AS IS" basis, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND,

either express or implied, as more fully set forth in the License.

#

See the NOTICE file distributed with this work for information regarding copyright ownership.

#

------alluxio-configmap.yaml------ apiVersion: v1 kind: ConfigMap metadata: annotations: "helm.sh/hook": "pre-install" "helm.sh/hook-delete-policy": before-hook-creation name: alluxio-config labels: name: alluxio-config app: alluxio chart: alluxio-0.6.13 release: alluxio heritage: Helm data: ALLUXIO_JAVA_OPTS: |- -Dalluxio.license.file=/secrets/alluxio-license/license.json -Dalluxio.hub.agent.rpc.hostname=${ALLUXIO_HUB_AGENT_RPC_HOSTNAME} -Dalluxio.master.hostname=alluxio-master-0 -Dalluxio.master.journal.type=UFS -Dalluxio.master.journal.folder=/journal -Dalluxio.fuse.logging.threshold=1000ms -Dalluxio.hub.manager.rpc.hostname=alluxio-master-0 -Dalluxio.hub.manager.web.login.password=alluxio -Dalluxio.hub.manager.web.login.username=alluxio -Dalluxio.security.stale.channel.purge.interval=365d -Dalluxio.user.block.master.client.pool.gc.threshold=1h -Dalluxio.user.block.master.client.pool.size.max=1024 -Dalluxio.user.block.read.metrics.enabled=false -Dalluxio.user.block.worker.client.pool.max=10240 -Dalluxio.user.file.master.client.pool.size.max=1024 -Dalluxio.user.file.passive.cache.enabled=false -Dalluxio.user.metadata.cache.enabled=true -Dalluxio.user.metadata.cache.expiration.time=2h -Dalluxio.user.metadata.cache.max.size=20000 -Dalluxio.user.metrics.collection.enabled=true -Dalluxio.user.short.circuit.enabled=true -Dalluxio.user.update.file.accesstime.disabled=true -Dalluxio.worker.block.master.client.pool.size=1024 -Dalluxio.worker.data.server.domain.socket.as.uuid=false -Dalluxio.worker.fuse.enabled=true -Dalluxio.worker.fuse.mount.alluxio.path=/ -Dalluxio.worker.fuse.mount.options=allow_other,kernel_cache,max_read=131072,attr_timeout=7200,entry_timeout=7200 -Dalluxio.worker.fuse.mount.point=/tmp/alluxio-fuse -Dalluxio.worker.network.reader.buffer.size=32MB -Dalluxio.worker.tieredstore.block.locks=100000 ALLUXIO_MASTER_JAVA_OPTS: |- -Dalluxio.master.hostname=${ALLUXIO_MASTER_HOSTNAME} -Xms32G -Xmx32G -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap -XX:MaxRAMFraction=2 ALLUXIO_JOB_MASTER_JAVA_OPTS: |- -Dalluxio.master.hostname=${ALLUXIO_MASTER_HOSTNAME} -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap -XX:MaxRAMFraction=2 ALLUXIO_WORKER_JAVA_OPTS: |- -Dalluxio.worker.hostname=${ALLUXIO_WORKER_HOSTNAME} -Dalluxio.worker.rpc.port=29999 -Dalluxio.worker.web.port=30000 -Dalluxio.worker.secure.rpc.port=29997 -Dalluxio.worker.container.hostname=${ALLUXIO_WORKER_CONTAINER_HOSTNAME} -Dalluxio.worker.ramdisk.size=6G -Dalluxio.worker.tieredstore.levels=1 -Dalluxio.worker.tieredstore.level0.alias=MEM -Dalluxio.worker.tieredstore.level0.dirs.mediumtype=MEM -Dalluxio.worker.tieredstore.level0.dirs.path=/dev/shm -Dalluxio.worker.tieredstore.level0.dirs.quota=8G -Dalluxio.worker.tieredstore.level0.watermark.high.ratio=0.95 -Dalluxio.worker.tieredstore.level0.watermark.low.ratio=0.7 -Xmx20G -Xms20G -XX:MaxDirectMemorySize=18g -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap -XX:MaxRAMFraction=2 ALLUXIO_JOB_WORKER_JAVA_OPTS: |- -Dalluxio.worker.hostname=${ALLUXIO_WORKER_HOSTNAME} -Dalluxio.job.worker.rpc.port=30001 -Dalluxio.job.worker.data.port=30002 -Dalluxio.job.worker.web.port=30003 -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap -XX:MaxRAMFraction=2 ALLUXIO_FUSE_JAVA_OPTS: |- -Dalluxio.user.hostname=${ALLUXIO_CLIENT_HOSTNAME} -XX:MaxDirectMemorySize=2g ALLUXIO_WORKER_TIEREDSTORE_LEVEL0_DIRS_PATH: /dev/shm

------alluxio-master-service.yaml------

Source: alluxio/templates/master/service.yaml

#

The Alluxio Open Foundation licenses this work under the Apache License, version 2.0

(the "License"). You may not use this work except in compliance with the License, which is

available at www.apache.org/licenses/LICENSE-2.0

#

This software is distributed on an "AS IS" basis, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND,

either express or implied, as more fully set forth in the License.

#

See the NOTICE file distributed with this work for information regarding copyright ownership.

#

apiVersion: v1 kind: Service metadata: name: alluxio-master-0 labels: app: alluxio chart: alluxio-0.6.13 release: alluxio heritage: Helm role: alluxio-master spec: ports:

------alluxio-master-statefulset.yaml------

Source: alluxio/templates/master/statefulset.yaml

#

The Alluxio Open Foundation licenses this work under the Apache License, version 2.0

(the "License"). You may not use this work except in compliance with the License, which is

available at www.apache.org/licenses/LICENSE-2.0

#

This software is distributed on an "AS IS" basis, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND,

either express or implied, as more fully set forth in the License.

#

See the NOTICE file distributed with this work for information regarding copyright ownership.

#

apiVersion: apps/v1 kind: StatefulSet metadata: name: alluxio-master labels: name: alluxio-master app: alluxio chart: alluxio-0.6.13 release: alluxio heritage: Helm role: alluxio-master spec: selector: matchLabels: app: alluxio role: alluxio-master name: alluxio-master serviceName: alluxio-master replicas: 1 template: metadata: labels: name: alluxio-master app: alluxio chart: alluxio-0.6.13 release: alluxio heritage: Helm role: alluxio-master spec: hostNetwork: false dnsPolicy: ClusterFirst nodeSelector: securityContext: fsGroup: 0 initContainers:

------alluxio-worker-daemonset.yaml------

Source: alluxio/templates/worker/daemonset.yaml

#

The Alluxio Open Foundation licenses this work under the Apache License, version 2.0

(the "License"). You may not use this work except in compliance with the License, which is

available at www.apache.org/licenses/LICENSE-2.0

#

This software is distributed on an "AS IS" basis, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND,

either express or implied, as more fully set forth in the License.

#

See the NOTICE file distributed with this work for information regarding copyright ownership.

#

apiVersion: apps/v1 kind: DaemonSet metadata: name: alluxio-worker labels: app: alluxio chart: alluxio-0.6.13 release: alluxio heritage: Helm role: alluxio-worker spec: selector: matchLabels: app: alluxio release: alluxio role: alluxio-worker template: metadata: labels: app: alluxio chart: alluxio-0.6.13 release: alluxio heritage: Helm role: alluxio-worker spec: hostNetwork: false hostPID: false dnsPolicy: ClusterFirst securityContext: fsGroup: 0 nodeSelector:

  containers:

    - name: alluxio-worker
      image: alluxio/alluxio:2.6.0-RC2
      imagePullPolicy: IfNotPresent
      securityContext:
        runAsUser: 0
        runAsGroup: 0
        privileged: true
        capabilities:
          add:
            - SYS_ADMIN
      resources:
        limits:
          cpu: 2
          memory: 6G
        requests:
          cpu: 2
          memory: 6G
      command: ["tini", "--", "/entrypoint.sh"]
      args:
        - worker-only
        - --no-format
      env:
      - name: ALLUXIO_WORKER_HOSTNAME
        valueFrom:
          fieldRef:
            fieldPath: status.hostIP
      - name: ALLUXIO_WORKER_CONTAINER_HOSTNAME
        valueFrom:
          fieldRef:
            fieldPath: status.podIP
      - name: ALLUXIO_CLIENT_HOSTNAME
        valueFrom:
          fieldRef:
            fieldPath: status.podIP
      envFrom:
      - configMapRef:
          name: alluxio-config
      readinessProbe:
        tcpSocket:
          port: rpc
      livenessProbe:
        tcpSocket:
          port: rpc
        initialDelaySeconds: 15
        periodSeconds: 30
        timeoutSeconds: 5
        failureThreshold: 2
      ports:
      - containerPort: 29999
        name: rpc
      - containerPort: 30000
        name: web
      volumeMounts:
        - name: alluxio-fuse-mount
          mountPath: /tmp/alluxio-fuse
          mountPropagation: Bidirectional
        - mountPath: /dev/shm
          name: mem
        - name: "nfs-pvc"
          mountPath: "/mnt/nfs"
    - name: alluxio-job-worker
      image: alluxio/alluxio:2.6.0-RC2
      securityContext:
        runAsUser: 0
        runAsGroup: 0
      imagePullPolicy: IfNotPresent
      resources:
        limits:
          cpu: 1
          memory: 1G
        requests:
          cpu: 1
          memory: 1G
      command: ["tini", "--", "/entrypoint.sh"]
      args:
        - job-worker
      env:
      - name: ALLUXIO_WORKER_HOSTNAME
        valueFrom:
          fieldRef:
            fieldPath: status.hostIP
      - name: ALLUXIO_WORKER_CONTAINER_HOSTNAME
        valueFrom:
          fieldRef:
            fieldPath: status.podIP
      - name: ALLUXIO_CLIENT_HOSTNAME
        valueFrom:
          fieldRef:
            fieldPath: status.hostIP
      envFrom:
      - configMapRef:
          name: alluxio-config
      readinessProbe:
        tcpSocket:
          port: job-rpc
      livenessProbe:
        tcpSocket:
          port: job-rpc
        initialDelaySeconds: 15
        periodSeconds: 30
        timeoutSeconds: 5
        failureThreshold: 2
      ports:
      - containerPort: 30001
        name: job-rpc
      - containerPort: 30002
        name: job-data
      - containerPort: 30003
        name: job-web
      volumeMounts:
        - mountPath: /dev/shm
          name: mem
        - name: "nfs-pvc"
          mountPath: "/mnt/nfs"
  restartPolicy: Always
  volumes:
    - name: mem
      emptyDir:
        medium: "Memory"
        sizeLimit: 8G
    - name: "nfs-pvc"
      persistentVolumeClaim:
        claimName: "nfs-pvc"
    - name: alluxio-fuse-mount
      hostPath:
        path: /tmp/alluxio-fuse
        type: DirectoryOrCreate
yuzhu commented 3 years ago

@LuQQiu any updates?

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in two weeks if no further activity occurs. Thank you for your contributions.