[FEATURES] Implement Self-Reporting Progress Feature to Enhance Fluid Data Operation Monitoring and Management

Background

In the Fluid project, data operations (dataprocess, dataload, datamigrate) are core functionalities, including data processing, preheating, and migration. To better monitor and manage these operations, a self-reporting progress feature needs to be implemented. This feature will present the work progress of data operations in their status.

Objectives

Design a general mechanism to uniformly present progress in Fluid's data operation CRDs. This mechanism should be similar to Argo's Self-Reporting Progress, where users specify the progress updates in a file within the container, and Fluid's controller updates the status in the CRD.
Implement a proof-of-concept solution using DataProcess.

Feature Requirements

Progress Reporting Mechanism:
- Each data operation task should be capable of generating progress reports during execution.
- The progress reports should follow an N/M format, where N is the completed amount of work and M is the total amount of work.
Environment Variable Configuration:
- Define an environment variable FLUID_PROGRESS_FILE, which specifies the location of the progress report file.
Progress Report File:
- The data operation task must periodically update the FLUID_PROGRESS_FILE during execution to report the current progress.
Executor Reading Mechanism:
- The executor should periodically (e.g., every 3 seconds) check the FLUID_PROGRESS_FILE to get the latest progress information.
Progress Annotation:
- Upon task initiation, the task's metadata should set an initial progress annotation, such as fluid.io/data-progress: 0/100.
Progress Update:
- If the FLUID_PROGRESS_FILE is updated, the executor should update the task's annotation to reflect the latest progress.
Progress Display:
- The monitoring system should be able to read the task's annotations and display the real-time progress of each data operation task on the user interface.

Example Code

apiVersion: fluid.io/v1alpha1
kind: DataProcess
metadata:
  name: train-flow-step1
spec:
  dataset:
    name: jfsdemo
    namespace: default
  processor:
    metadata:
      annotations:
        fluid.io/data-progress: 0/100
    script:
      image: nginx
      imageTag: latest
      command: ["bash"]
      script: |
              for i in $(seq 1 10); do
                sleep 10
                echo "$(($i*10))/100" > $FLUID_PROGRESS_FILE
              done

This example provides a basic framework.

Design a self-reporting progress feature for Fluid and provide a design document following the Fluid Design Workflow
Implement a proof-of-concept solution using DataProcess
Add documentation and a demo to support the usage of self-reporting progress for DataProcess in Fluid

fluid-cloudnative / fluid

[FEATURES] Implement Self-Reporting Progress Feature to Enhance Fluid Data Operation Monitoring and Management #4192

Background

Objectives

Feature Requirements

Example Code