Alluxio / alluxio

Alluxio, data orchestration for analytics and machine learning in the cloud
https://www.alluxio.io
Apache License 2.0
6.84k stars 2.94k forks source link

alluxio.exception.FileIncompleteException occured when datax export data from alluxio to clickhouse #14886

Open DirkJia opened 2 years ago

DirkJia commented 2 years ago

Alluxio Version: 2.3.0

Describe the bug occur this issue when datax export data from alluxio to clickhouse

-21 10:02:12,391 INFO [main] [.a.s.i.CommonActionServiceImpl] 2022-01-21 10:02:12.391 [job-0] INFO  AbstractFileSystem - Creating Alluxio configuration from Hadoop configuration {}, uri configuration {alluxio.zookeeper.address=null, alluxio.zookeeper.enabled=false, alluxio.master.hostname=alluxio-master-0.default.svc.cluster.local, alluxio.master.rpc.addresses=null, alluxio.master.embedded.journal.addresses=null, alluxio.master.rpc.port=19998}
2022-01-21 10:02:122022-01-21 10:02:12,492 INFO [main] [.a.s.i.CommonActionServiceImpl] 2022-01-21 10:02:12.492 [job-0] INFO  AbstractFileSystem - Initializing filesystem with connect details alluxio-master-0.default.svc.cluster.local:19998
2022-01-21 10:02:122022-01-21 10:02:12,603 INFO [main] [.a.s.i.CommonActionServiceImpl] 2022-01-21 10:02:12.602 [job-0] INFO  TieredIdentityFactory - Initialized tiered identity TieredIdentity(node=pipeline-10259809.pipeline-task-2166210-loop-3-9f88s, rack=null)
2022-01-21 10:02:122022-01-21 10:02:12,825 INFO [main] [.a.s.i.CommonActionServiceImpl] 2022-01-21 10:02:12.824 [job-0] INFO  NettyUtils - EPOLL_MODE is available
2022-01-21 10:02:152022-01-21 10:02:15,014 INFO [main] [.a.s.i.CommonActionServiceImpl] 2022-01-21 10:02:15.013 [job-0] ERROR HdfsReader$Job - 检查文件[alluxio://alluxio-master-0.default.svc.cluster.local:19998/cdp/ads_stor_item_sku_parcel_paid_mix_df_test/pt=2022-01-20/part-00011-711d1977-cd70-46ef-bc2a-852b09121bcc-c000]类型失败,目前支持ORC,SEQUENCE,RCFile,TEXT,CSV五种格式的文件,请检查您文件类型和文件是否正确。
2022-01-21 10:02:152022-01-21 10:02:15,017 INFO [main] [.a.s.i.CommonActionServiceImpl] 2022-01-21 10:02:15.017 [job-0] ERROR JobContainer - Exception when job run
2022-01-21 10:02:152022-01-21 10:02:15,017 INFO [main] [.a.s.i.CommonActionServiceImpl] com.alibaba.datax.common.exception.DataXException: Code:[HdfsReader-10], Description:[读取文件出错].  - 检查文件[alluxio://alluxio-master-0.default.svc.cluster.local:19998/cdp/ads_stor_item_sku_parcel_paid_mix_df_test/pt=2022-01-20/part-00011-711d1977-cd70-46ef-bc2a-852b09121bcc-c000]类型失败,目前支持ORC,SEQUENCE,RCFile,TEXT,CSV五种格式的文件,请检查您文件类型和文件是否正确。 - java.io.IOException: alluxio.exception.FileIncompleteException: Cannot read from /cdp/ads_stor_item_sku_parcel_paid_mix_df_test/pt=2022-01-20/part-00011-711d1977-cd70-46ef-bc2a-852b09121bcc-c000 because it is incomplete
2022-01-21 10:02:152022-01-21 10:02:15,017 INFO [main] [.a.s.i.CommonActionServiceImpl]     at alluxio.hadoop.HdfsFileInputStream.<init>(HdfsFileInputStream.java:65)
2022-01-21 10:02:152022-01-21 10:02:15,017 INFO [main] [.a.s.i.CommonActionServiceImpl]     at alluxio.hadoop.AbstractFileSystem.open(AbstractFileSystem.java:627)
2022-01-21 10:02:152022-01-21 10:02:15,017 INFO [main] [.a.s.i.CommonActionServiceImpl]     at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:767)
2022-01-21 10:02:152022-01-21 10:02:15,017 INFO [main] [.a.s.i.CommonActionServiceImpl]     at com.alibaba.datax.plugin.reader.hdfsreader.DFSUtil.checkHdfsFileType(DFSUtil.java:712)
2022-01-21 10:02:152022-01-21 10:02:15,017 INFO [main] [.a.s.i.CommonActionServiceImpl]     at com.alibaba.datax.plugin.reader.hdfsreader.DFSUtil.addSourceFileByType(DFSUtil.java:204)
2022-01-21 10:02:152022-01-21 10:02:15,017 INFO [main] [.a.s.i.CommonActionServiceImpl]     at com.alibaba.datax.plugin.reader.hdfsreader.DFSUtil.getHDFSAllFilesNORegex(DFSUtil.java:191)
2022-01-21 10:02:152022-01-21 10:02:15,017 INFO [main] [.a.s.i.CommonActionServiceImpl]     at com.alibaba.datax.plugin.reader.hdfsreader.DFSUtil.getHDFSAllFiles(DFSUtil.java:160)
2022-01-21 10:02:152022-01-21 10:02:15,017 INFO [main] [.a.s.i.CommonActionServiceImpl]     at com.alibaba.datax.plugin.reader.hdfsreader.DFSUtil.getAllFiles(DFSUtil.java:131)
2022-01-21 10:02:152022-01-21 10:02:15,017 INFO [main] [.a.s.i.CommonActionServiceImpl]     at com.alibaba.datax.plugin.reader.hdfsreader.HdfsReader$Job.prepare(HdfsReader.java:170)
2022-01-21 10:02:152022-01-21 10:02:15,017 INFO [main] [.a.s.i.CommonActionServiceImpl]     at com.alibaba.datax.core.job.JobContainer.prepareJobReader(JobContainer.java:715)
2022-01-21 10:02:152022-01-21 10:02:15,017 INFO [main] [.a.s.i.CommonActionServiceImpl]     at com.alibaba.datax.core.job.JobContainer.prepare(JobContainer.java:308)
2022-01-21 10:02:152022-01-21 10:02:15,017 INFO [main] [.a.s.i.CommonActionServiceImpl]     at com.alibaba.datax.core.job.JobContainer.start(JobContainer.java:115)
2022-01-21 10:02:152022-01-21 10:02:15,017 INFO [main] [.a.s.i.CommonActionServiceImpl]     at com.alibaba.datax.core.Engine.start(Engine.java:92)
2022-01-21 10:02:152022-01-21 10:02:15,017 INFO [main] [.a.s.i.CommonActionServiceImpl]     at com.alibaba.datax.core.Engine.entry(Engine.java:171)
2022-01-21 10:02:152022-01-21 10:02:15,017 INFO [main] [.a.s.i.CommonActionServiceImpl]     at com.alibaba.datax.core.Engine.main(Engine.java:204)
2022-01-21 10:02:152022-01-21 10:02:15,017 INFO [main] [.a.s.i.CommonActionServiceImpl] Caused by: alluxio.exception.FileIncompleteException: Cannot read from /cdp/ads_stor_item_sku_parcel_paid_mix_df_test/pt=2022-01-20/part-00011-711d1977-cd70-46ef-bc2a-852b09121bcc-c000 because it is incomplete
2022-01-21 10:02:152022-01-21 10:02:15,017 INFO [main] [.a.s.i.CommonActionServiceImpl]     at alluxio.client.file.BaseFileSystem.openFile(BaseFileSystem.java:351)
2022-01-21 10:02:152022-01-21 10:02:15,017 INFO [main] [.a.s.i.CommonActionServiceImpl]     at alluxio.client.file.BaseFileSystem.openFile(BaseFileSystem.java:339)
2022-01-21 10:02:152022-01-21 10:02:15,017 INFO [main] [.a.s.i.CommonActionServiceImpl]     at alluxio.client.file.FileSystem.openFile(FileSystem.java:422)
2022-01-21 10:02:152022-01-21 10:02:15,017 INFO [main] [.a.s.i.CommonActionServiceImpl]     at alluxio.hadoop.HdfsFileInputStream.<init>(HdfsFileInputStream.java:60)
2022-01-21 10:02:152022-01-21 10:02:15,017 INFO [main] [.a.s.i.CommonActionServiceImpl]     ... 14 more
2022-01-21 10:02:152022-01-21 10:02:15,017 INFO [main] [.a.s.i.CommonActionServiceImpl] 

then i found this file in alluxio like this image next i found this file in ufs (minio) ,file is completeed ,and cat this file image

To Reproduce accidental appearance

Expected behavior alluxio data can be read normally by datax

Urgency low

Are you planning to fix it i have no idea about this now

Additional context . this file not change when is export data use datax . running in k8s . the spark table config is image . the alluxio sts and cm is image

apiVersion: apps/v1
kind: StatefulSet
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"apps/v1","kind":"StatefulSet","metadata":{"annotations":{},"labels":{"app":"alluxio","name":"alluxio-master","role":"alluxio-master"},"name":"alluxio-master","namespace":"default"},"spec":{"replicas":3,"selector":{"matchLabels":{"app":"alluxio","name":"alluxio-master","role":"alluxio-master"}},"serviceName":"alluxio-master","template":{"metadata":{"labels":{"app":"alluxio","name":"alluxio-master","role":"alluxio-master"}},"spec":{"affinity":{"nodeAffinity":{"requiredDuringSchedulingIgnoredDuringExecution":{"nodeSelectorTerms":[{"matchExpressions":[{"key":"dice/alluxio","operator":"Exists"}]}]}}},"containers":[{"args":["master-only","--no-format"],"command":["/entrypoint.sh"],"env":[{"name":"ALLUXIO_MASTER_HOSTNAME","valueFrom":{"fieldRef":{"fieldPath":"status.podIP"}}}],"envFrom":[{"configMapRef":{"name":"alluxio-config"}}],"image":"registry.cn-hangzhou.aliyuncs.com/dice-third-party/alluxio:2.3.0","imagePullPolicy":"IfNotPresent","livenessProbe":{"exec":{"command":["alluxio-monitor.sh","master"]},"failureThreshold":2,"initialDelaySeconds":15,"periodSeconds":30,"timeoutSeconds":5},"name":"alluxio-master","ports":[{"containerPort":19998,"name":"rpc"},{"containerPort":19999,"name":"web"}],"readinessProbe":{"exec":{"command":["alluxio-monitor.sh","master"]}},"resources":{"limits":{"cpu":4,"memory":"8G"},"requests":{"cpu":1,"memory":"1G"}},"securityContext":{"runAsGroup":1000,"runAsUser":1000},"volumeMounts":null},{"args":["job-master"],"command":["/entrypoint.sh"],"env":[{"name":"ALLUXIO_MASTER_HOSTNAME","valueFrom":{"fieldRef":{"fieldPath":"status.podIP"}}}],"envFrom":[{"configMapRef":{"name":"alluxio-config"}}],"image":"registry.cn-hangzhou.aliyuncs.com/dice-third-party/alluxio:2.3.0","imagePullPolicy":"IfNotPresent","livenessProbe":{"exec":{"command":["alluxio-monitor.sh","job_master"]},"failureThreshold":2,"initialDelaySeconds":15,"periodSeconds":30,"timeoutSeconds":5},"name":"alluxio-job-master","ports":[{"containerPort":20001,"name":"job-rpc"},{"containerPort":20002,"name":"job-web"}],"readinessProbe":{"exec":{"command":["alluxio-monitor.sh","job_master"]}},"resources":{"limits":{"cpu":4,"memory":"8G"},"requests":{"cpu":1,"memory":"1G"}},"securityContext":{"runAsGroup":1000,"runAsUser":1000},"volumeMounts":null}],"dnsPolicy":"ClusterFirst","hostNetwork":false,"imagePullSecrets":[{"name":"aliyun-registry"}],"initContainers":null,"nodeSelector":null,"restartPolicy":"Always","securityContext":{"fsGroup":1000},"volumes":null}},"volumeClaimTemplates":null}}
  creationTimestamp: "2021-11-12T05:53:52Z"
  generation: 1
  labels:
    app: alluxio
    name: alluxio-master
    role: alluxio-master
  name: alluxio-master
  namespace: default
  resourceVersion: "1870170888"
  uid: 1311f4bf-541b-4078-bea9-9047ad00b6af
spec:
  podManagementPolicy: OrderedReady
  replicas: 3
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: alluxio
      name: alluxio-master
      role: alluxio-master
  serviceName: alluxio-master
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: alluxio
        name: alluxio-master
        role: alluxio-master
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: dice/alluxio
                operator: Exists
      containers:
      - args:
        - master-only
        - --no-format
        command:
        - /entrypoint.sh
        env:
        - name: ALLUXIO_MASTER_HOSTNAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: status.podIP
        envFrom:
        - configMapRef:
            name: alluxio-config
        image: registry.cn-hangzhou.aliyuncs.com/dice-third-party/alluxio:2.3.0
        imagePullPolicy: IfNotPresent
        livenessProbe:
          exec:
            command:
            - alluxio-monitor.sh
            - master
          failureThreshold: 2
          initialDelaySeconds: 15
          periodSeconds: 30
          successThreshold: 1
          timeoutSeconds: 5
        name: alluxio-master
        ports:
        - containerPort: 19998
          name: rpc
          protocol: TCP
        - containerPort: 19999
          name: web
          protocol: TCP
        readinessProbe:
          exec:
            command:
            - alluxio-monitor.sh
            - master
          failureThreshold: 3
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        resources:
          limits:
            cpu: "4"
            memory: 8G
          requests:
            cpu: "1"
            memory: 1G
        securityContext:
          runAsGroup: 1000
          runAsUser: 1000
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      - args:
        - job-master
        command:
        - /entrypoint.sh
        env:
        - name: ALLUXIO_MASTER_HOSTNAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: status.podIP
        envFrom:
        - configMapRef:
            name: alluxio-config
        image: registry.cn-hangzhou.aliyuncs.com/dice-third-party/alluxio:2.3.0
        imagePullPolicy: IfNotPresent
        livenessProbe:
          exec:
            command:
            - alluxio-monitor.sh
            - job_master
          failureThreshold: 2
          initialDelaySeconds: 15
          periodSeconds: 30
          successThreshold: 1
          timeoutSeconds: 5
        name: alluxio-job-master
        ports:
        - containerPort: 20001
          name: job-rpc
          protocol: TCP
        - containerPort: 20002
          name: job-web
          protocol: TCP
        readinessProbe:
          exec:
            command:
            - alluxio-monitor.sh
            - job_master
          failureThreshold: 3
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        resources:
          limits:
            cpu: "4"
            memory: 8G
          requests:
            cpu: "1"
            memory: 1G
        securityContext:
          runAsGroup: 1000
          runAsUser: 1000
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      imagePullSecrets:
      - name: aliyun-registry
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext:
        fsGroup: 1000
      terminationGracePeriodSeconds: 30
  updateStrategy:
    rollingUpdate:
      partition: 0
    type: RollingUpdate
status:
  collisionCount: 0
  currentReplicas: 3
  currentRevision: alluxio-master-6767fb997b
  observedGeneration: 1
  readyReplicas: 3
  replicas: 3
  updateRevision: alluxio-master-6767fb997b
  updatedReplicas: 3
apc999 commented 2 years ago

can you try run alluxio fs stat /cdp/ads_stor_item_sku_parcel_paid_mix_df_test/pt=2022-01-20/part-00011-711d1977-cd70-46ef-bc2a-852b09121bcc-c000 and check the status of this file?

DirkJia commented 2 years ago

sorry to the late reply,this file had been deleted. since i mount ssd ,this have a low probability

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in two weeks if no further activity occurs. Thank you for your contributions.