dfinity / ICRC

Repository to ICRC proposals
Apache License 2.0
26 stars 5 forks source link

Canister Workload Processing Pipeline Standard #48

Open panindustrial-dev opened 5 months ago

panindustrial-dev commented 5 months ago

Note: We are submitting this Issue to reserve an ICRC number. The following is incomplete and may change(see todos).

Abstract

The proposed Internet Computer Request for Comment (ICRC) introduces a comprehensive standard for processing workloads between canisters on the DFINITY Internet Computer platform. This standard delineates a robust, scalable workflow enabling users to submit processing requests to a canister, upload data for processing, and subsequently retrieve the results. The proposal lays out a structured approach for managing and transferring data in a distributed canister environment, focusing on efficient data handling, scalability, and interoperability among diverse canisters.

This ICRC standard, with its detailed specification of types, methods, and data flow, aims to establish a common framework that fosters seamless interaction between canisters, enhancing the overall capability of the Internet Computer ecosystem. By providing a clear guideline for data processing and transfer, this proposal aims to streamline development efforts, reduce complexity, and encourage the creation of more sophisticated and interconnected dApps on the Internet Computer network.

Motivation

The motivation behind this ICRC proposal stems from the growing need for a standardized, efficient mechanism to handle and process data across different canisters on the Internet Computer platform. As the ecosystem expands, the interaction between various canisters becomes increasingly complex and pivotal for the development of advanced decentralized applications (dApps). This complexity highlights the necessity for a unified protocol that addresses several key aspects:

  1. Interoperability: With a multitude of canisters operating on the Internet Computer, ensuring seamless data exchange and processing capabilities is crucial. This standard aims to foster interoperability, allowing canisters to effectively communicate and collaborate regardless of their underlying implementation details.

  2. Scalability: As dApps grow in complexity and size, the ability to process large volumes of data efficiently becomes vital. This proposal introduces a scalable approach to handle extensive data workloads, enabling canisters to manage large datasets effectively and maintain high performance.

  3. Security and Reliability: Ensuring the integrity and security of data during processing and transfer is a key concern. The proposed standard incorporates mechanisms for secure data handling, including provisions for data chunking and validation, thereby enhancing the overall reliability of inter-canister interactions.

  4. Developer Experience: By providing a clear and comprehensive standard, this ICRC aims to simplify the development process for dApp creators. A unified framework reduces the learning curve and development overhead, allowing developers to focus on building innovative solutions rather than dealing with compatibility and data management intricacies.

  5. Good Shape: The standard aims to be "shaped like the Internet Computer." While the IC provides amazing features there are technical limitations. The standard specifically addresses the limitation of size(about 2MB) between canisters and the cycle limit inherent to each round of computing. This standard, and the supporting components that will be built around the standard can significancy reduce a developers need to consider these features and will hopefully provide components that 'just work'.

Specification

The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

Intended Use

The ICRC standard for processing workloads between canisters on the DFINITY Internet Computer is designed with versatility and scalability in mind, catering to a variety of use cases within the Internet Computer ecosystem. Here are the primary intended uses, along with examples for each:

1. Intercanister Communication

2. Mono-Canister Processing

3. Ingress from Outside the Internet Computer

Logic Flow of Data Processing in ProcessActor

The ProcessActor in the proposed ICRC standard outlines a comprehensive flow for processing data, especially for workloads exceeding 2MB. The process involves several steps, each handling specific parts of data processing and retrieval. Below is a detailed explanation of the logic flow, accompanied by a UML sequence diagram illustrating the interactions between the components.

1. Initiate Processing with process Method

2. Optional: Push Data Using pushChunk

3. Optional: Manually Process with singleStep

4. Optional: Check Processing Status

5. Optional: Retrieve Large Data Sets

Important Types

ProcessRequest

  public type ProcessRequest = {
    event: ?Text;  //User may provide an event namespace
    caller: ?Principal;
    expiresAt : ?Int;
    dataConfig: ?DataConfig;
    executionConfig: ?ExecutionConfig;
    responseConfig: ?ResponseConfig;
  };

Structure of ProcessRequest

The ProcessRequest type, used as an input for the process function in the ProcessActor, is a critical component in the data processing workflow. It encapsulates various parameters and configurations necessary for initiating and handling the processing of data. Below is an exhaustive documentation of the ProcessRequest type:

1. event
2. caller
3. expiresAt

4. dataConfig

5. executionConfig

public type ExecutionConfig = {
    #OnLoad;
    #Manual;
  };

6. responseConfig

7. createdAt

Usage

The ProcessRequest is used to encapsulate all necessary information for processing a task. When a client or another canister makes a processing request, it constructs a ProcessRequest instance with the appropriate fields and sends it to the ProcessActor. The ProcessActor then interprets this request and proceeds with the processing according to the specified configurations.

Example

Here's a hypothetical example of a ProcessRequest:

{
    event = ?"image-processing",
    caller = ?Principal.fromText("aaaaa-aa"),
    expiresAt = ?1625097600000,
    dataConfig = ?#Push(?{ permission = null; totalChunks = 10; }),
    executionConfig = ?#OnLoad,
    responseConfig = ?#Include
}

This request might represent an image processing task, where the data is sent in chunks, processed upon arrival, and the response is expected to be included directly in the processing result.

ProcessResponse

public type ProcessResponse = ?{
    #DataIncluded: ?{
        payload: [CandyTypes.AddressedChunk];
    };
    #Local : Nat;
    #IntakeNeeded: ?{
        pipeInstanceID: PipeInstanceID;
        currentChunks: Nat;
        totalChunks: Nat;
        chunkMap: [Bool];
    };
    #OuttakeNeeded: ?{
        pipeInstanceID: PipeInstanceID;
    };
    #StepProcess: ?{
        pipeInstanceID: PipeInstanceID;
        status: ProcessType;
    };
    #Assigned: {
      pipeInstanceID: PipeInstanceID;
      canister:Principal; //this process has been assigned to another server for processing.
    };
  };

The ProcessResponse type in the ProcessActor is essential for conveying the status and results of a data processing request. It's a variant type, meaning it can take different forms based on the processing stage or outcome. Below is an exhaustive documentation of each variant in the ProcessResponse type:

Structure of ProcessResponse

ProcessResponse is an optional type (denoted by ?), which means it can also be null. This is provided for future upgradeability and a ProcessResponse MUST not be null.

1. #DataIncluded
2. #Local
3. #IntakeNeeded
4. #OuttakeNeeded
5. #StepProcess
6. #Assigned

Usage

The ProcessResponse is used by the ProcessActor to communicate the status, results, and further action required for a processing request. Depending on the processing task's nature, size, and complexity, the appropriate variant of ProcessResponse is returned to the caller.

Example

Here's an example of a ProcessResponse indicating that more data chunks are needed:

#IntakeNeeded(?{
    pipeInstanceID = 12345;
    currentChunks = 3;
    totalChunks = 10;
    chunkMap = [true, true, true, false, false, false, false, false, false, false];
})

This response suggests that the process, identified by pipeInstanceID = 12345, has received 3 out of 10 expected data chunks, with the first three chunks already received (as indicated by the chunkMap).

Methods

Each method in the ProcessActor plays a specific role:

1. process Method

2. pushChunk Method

public type ChunkPush = {
    pipeInstanceID: PipeInstanceID;
    chunk: ChunkDetail;
  };

3. getPushStatus Method

4. getProcessingStatus Method

public type ProcessingStatusRequest = {
    pipeInstanceID: PipeInstanceID;
  };

5. singleStep Method

For sequential flows, the step will typically be null. For flows with Parallel execution, the canister can initiate multiple calls, each operating on a different step of the process. This is useful for known operation sizes that can be calculated deterministic.

6. getChunk Method

  public type ChunkGet = {
    chunkID: Nat;
    chunkSize: Nat;
    pipeInstanceID: PipeInstanceID;
  };

The chunkSize should be consistent across each call to get the full dataset.

  public type ChunkDetail = {
      data : [CandyTypes.AddressedChunk];
      index : Nat
    };

  public type ChunkResponse = {
    #Chunk: {
      chunk : ChunkDetail;
      totalChunks : Nat;
    };
    #Error: ProcessError;
  };

Note that the client is responsible for determining when and if they have received all the response chunks.


Security Considerations

The proposed ICRC standard for processing workloads between canisters on the Internet Computer places significant emphasis on security, particularly in how processes are accessed and managed. It's important to note that while the standard provides a robust framework for data processing and transfer, it adopts an agnostic stance towards security at the process level. This approach allows for flexibility and adaptability in various implementation contexts. Below are the key security considerations:

1. Agnosticism at the Process Level

2. Configurable Access Control

Todo:

The following features should be considered during ICRC consideration: