fluid-cloudnative / fluid

Fluid, elastic data abstraction and acceleration for BigData/AI applications in cloud. (Project under CNCF)
https://fluid-cloudnative.github.io/
Apache License 2.0
1.58k stars 949 forks source link

[FEATURES] Implement Self-Reporting Progress Feature to Enhance Fluid Data Operation Monitoring and Management #4192

Open cheyang opened 6 days ago

cheyang commented 6 days ago

Background

In the Fluid project, data operations (dataprocess, dataload, datamigrate) are core functionalities, including data processing, preheating, and migration. To better monitor and manage these operations, a self-reporting progress feature needs to be implemented. This feature will present the work progress of data operations in their status.

Objectives

  1. Design a general mechanism to uniformly present progress in Fluid's data operation CRDs. This mechanism should be similar to Argo's Self-Reporting Progress, where users specify the progress updates in a file within the container, and Fluid's controller updates the status in the CRD.
  2. Implement a proof-of-concept solution using DataProcess.

Feature Requirements

  1. Progress Reporting Mechanism:

    • Each data operation task should be capable of generating progress reports during execution.
    • The progress reports should follow an N/M format, where N is the completed amount of work and M is the total amount of work.
  2. Environment Variable Configuration:

    • Define an environment variable FLUID_PROGRESS_FILE, which specifies the location of the progress report file.
  3. Progress Report File:

    • The data operation task must periodically update the FLUID_PROGRESS_FILE during execution to report the current progress.
  4. Executor Reading Mechanism:

    • The executor should periodically (e.g., every 3 seconds) check the FLUID_PROGRESS_FILE to get the latest progress information.
  5. Progress Annotation:

    • Upon task initiation, the task's metadata should set an initial progress annotation, such as fluid.io/data-progress: 0/100.
  6. Progress Update:

    • If the FLUID_PROGRESS_FILE is updated, the executor should update the task's annotation to reflect the latest progress.
  7. Progress Display:

    • The monitoring system should be able to read the task's annotations and display the real-time progress of each data operation task on the user interface.

Example Code

apiVersion: fluid.io/v1alpha1
kind: DataProcess
metadata:
  name: train-flow-step1
spec:
  dataset:
    name: jfsdemo
    namespace: default
  processor:
    metadata:
      annotations:
        fluid.io/data-progress: 0/100
    script:
      image: nginx
      imageTag: latest
      command: ["bash"]
      script: |
              for i in $(seq 1 10); do
                sleep 10
                echo "$(($i*10))/100" > $FLUID_PROGRESS_FILE
              done

This example provides a basic framework.