Investigate: Map existing data in YHS to implement the `allocations` details page

sudiptob2 commented 1 week ago

Context

In the YHS web UI, there is a page that displays detailed information about applications and their allocations. The design includes an Applications page with a table listing all the applications. When you click on an application, it reveals a list of allocations associated with that application. By selecting a specific allocation, you can view its detailed information. See the following two images for reference:

Problem Statement

Not all the fields required in the design are directly available in the YHS database. Some information exists in the YHS DB but may be stored under different names or structures. To implement the allocation details page, we need to map the existing data in the YHS database to the corresponding design fields. Additionally, we must identify any missing fields and determine how to handle them.

sudiptob2 commented 1 week ago

Investigation Result

I did an initial investigation and found the following points. I tried to map the available data in YHS to the UI design. Below are some suggestions for how we can use existing fields that don't directly match the design labels. Please note that these are not final solutions, need to be evaluated before final implementation.

Allocation List Page

Currently, we don’t have a state field for allocations. Two options:

Derive state from requests field:
- If allocations is null → status is pending.
- If allocations is not null → status is allocated.
  Note: This approach won’t support states like running, failed, success, etc. which are shown in the design.
  
  Why: In yunikorn core API, we get the requested allocation details in the requests field. Once the requested resource is allocated, the API sends information in the allocations field. So, until the resource is allocated we won't see any data in the allocations filed.
Use application states:
- We could align allocation states with the existing application states, which are similar to those shown in the design.

Allocations Detail Page

Ambiguous Fields

User: Use the user field from the application (same for allocations).
Name: No name field for allocations. We can use allocationID (UUID) instead.
Application Priority:
- Two fields exist: MaxRequestPriority in the application and priority in the allocation. We need to decide which one to use.
Final Status Reported by AM:
- “AM” likely stands for Allocation Manager. No direct field for this. We could derive it by comparing the requests

Needs Requirements Clarification

YarnApplicationState: The state field from the application might work here.
Queue: Use the queue field from the application (same for allocations). field with the allocations field.
- If allocations is null → not allocated yet, check request details.
Started, Launched, Finished:
- Use requestTime, allocationTime, and allocationDelay to calculate these fields.
Log Aggregation Status: Need more details on what this field means.
Application Node Label Expression: allocationTags might provide this info.
AM Container Node Label Expression: Same as above, allocationTags could be useful.
History: Need more details.

Missing Fields

Application Timeout: No existing field for this.
Unmanaged Application: No existing field for this.
Application Type: No field or doc reference found for Application Type.

dave-gantenbein commented 5 days ago

User: Use the user field from the application (same for allocations). (User that submitted the job, available as an annotation on the Pod yunikorn.apache.org.user_info, injected by the admission controller, need fallback if admission controller is not enabled ie k8s service account) Name: No name field for allocations. We can use allocationID (UUID) instead. (Name is attached to spark job on submission, can be found as a label on the pod "spark-name", perhaps fall back to ???) Application Priority: Kill it. Final Status Reported by Application Master: status of the job (end result of the driver pod "spark-role=driver")

Needs Requirements Clarification YarnApplicationState: combine with Final status in to single field: "status" Queue: kill it, redundant from parent object

Started, Launched, Finished: Use requestTime, allocationTime, and allocationDelay to calculate these fields. Log Aggregation Status: kill it Application Node Label Expression: kill it AM Container Node Label Expression: kill it History: kill it

Missing Fields Application Timeout: kill Unmanaged Application: kill Application Type: "spark" for now, we can extend it later.

G-Research / yunikorn-history-server