SJTU-IPADS / reef

REEF is a GPU-accelerated DNN inference serving system that enables instant kernel preemption and biased concurrent execution in GPU scheduling.
Apache License 2.0
85 stars 7 forks source link

non-idempotent kernels discussion #9

Closed utkusaglm closed 10 months ago

utkusaglm commented 10 months ago

Hello,

Regarding the non-idempotent kernel discussion in the paper you wrote:

We leave the incorporation of this technique to future work until we actually encounter non-idempotent DNN kernels.

Have you encountered any so far, or is the situation still the same?

Thanks.

francis0407 commented 10 months ago

Hi @utkusaglm, Thank you for your interest in our work.

Similar to many other idempotence-based systems, REEF requires programmers to annotate the idempotency of GPU kernels. For instance, we have verified that all GPU kernels in this repository (generated by TVM) are idempotent; thus, we have not implemented a mechanism to handle non-idempotent kernels.

However, when we attempted to extend REEF to support other DL frameworks, such as PyTorch and TensorRT, we encountered non-idempotent cases in these frameworks.

We have identified two major types of non-idempotent cases:

  1. Type 1 (non-idempotent kernel): In this scenario, the GPU kernel itself can only perform an in-place update, as demonstrated by the following code:

    def vecInc(A):
       A[i] = A[i] + 1

    It is relatively straightforward for a programmer to statically identify this kind of GPU kernel by checking whether a pointer-typed parameter is both read and written. To the best of my knowledge, TVM never generates this type of non-idempotent kernels (see the discussion). However, other frameworks, such as PyTorch and TensorRT, do generate this type of non-idempotent kernels.

  2. Type 2 (non-idempotent instances): In this case, the GPU kernel can have both idempotent and non-idempotent instances depending on its launch arguments. As exemplified by the following example, an instance can be non-idempotent if its input and output buffers are the same.

    def vecInc(A, B):
       A[i] = B[i] + 1
    
    vecAdd(a, b) # idempotent instance, b = a + 1
    vecAdd(a, a) # non-idempotent instance, a = a + 1

    This type of GPU kernel is prevalent (more than 80%) in DL frameworks, and we refer to them as conditionally-idempotent kernels. Furthermore, we have encountered real-world applications that launch both idempotent and non-idempotent instances of the same GPU kernel.

Identifying this kind of non-idempotency poses a significant challenge, as it requires checking the arguments for each instantiation of the GPU kernels. However, the GPU kernels instances are commonly dynamically launched by the DL frameworks, which is extremely difficult for programmers to manually check the idempotency.

Therefore, we have recently developed an analysis tool to dynamically validate the idempotency of the instances before they are launched. REEF can utilize this tool to identify the idempotency of GPU kernel instances from other DL frameworks like PyTorch and create memory snapshots for non-idempotent instances to ensure the safety of our reset-based preemption mechanism.

We plan to open source the idempotency analysis tool and integrate it with REEF this year.

utkusaglm commented 10 months ago

Thank you very much for the detailed answer. It was incredibly helpful.