G-Research / yunikorn-history-server

A service to store and provide historical data for K8S clusters using the Yunikorn scheduler
Apache License 2.0
7 stars 10 forks source link

Investigate: Map existing data in YHS to implement the `allocations` details page #195

Closed sudiptob2 closed 4 days ago

sudiptob2 commented 1 week ago

Context

In the YHS web UI, there is a page that displays detailed information about applications and their allocations. The design includes an Applications page with a table listing all the applications. When you click on an application, it reveals a list of allocations associated with that application. By selecting a specific allocation, you can view its detailed information. See the following two images for reference:

image image

Problem Statement

Not all the fields required in the design are directly available in the YHS database. Some information exists in the YHS DB but may be stored under different names or structures. To implement the allocation details page, we need to map the existing data in the YHS database to the corresponding design fields. Additionally, we must identify any missing fields and determine how to handle them.

sudiptob2 commented 1 week ago

Investigation Result

I did an initial investigation and found the following points. I tried to map the available data in YHS to the UI design. Below are some suggestions for how we can use existing fields that don't directly match the design labels. Please note that these are not final solutions, need to be evaluated before final implementation.

Allocation List Page

Currently, we don’t have a state field for allocations. Two options:

  1. Derive state from requests field:

    • If allocations is null → status is pending.
    • If allocations is not null → status is allocated.
      Note: This approach won’t support states like running, failed, success, etc. which are shown in the design.

      Why: In yunikorn core API, we get the requested allocation details in the requests field. Once the requested resource is allocated, the API sends information in the allocations field. So, until the resource is allocated we won't see any data in the allocations filed.

  2. Use application states:

    • We could align allocation states with the existing application states, which are similar to those shown in the design.

Allocations Detail Page

Ambiguous Fields

Needs Requirements Clarification

Missing Fields

dave-gantenbein commented 5 days ago

User: Use the user field from the application (same for allocations). (User that submitted the job, available as an annotation on the Pod yunikorn.apache.org.user_info, injected by the admission controller, need fallback if admission controller is not enabled ie k8s service account) Name: No name field for allocations. We can use allocationID (UUID) instead. (Name is attached to spark job on submission, can be found as a label on the pod "spark-name", perhaps fall back to ???) Application Priority: Kill it. Final Status Reported by Application Master: status of the job (end result of the driver pod "spark-role=driver")

Needs Requirements Clarification YarnApplicationState: combine with Final status in to single field: "status" Queue: kill it, redundant from parent object

Started, Launched, Finished: Use requestTime, allocationTime, and allocationDelay to calculate these fields. Log Aggregation Status: kill it Application Node Label Expression: kill it AM Container Node Label Expression: kill it History: kill it

Missing Fields Application Timeout: kill Unmanaged Application: kill Application Type: "spark" for now, we can extend it later.