eclipse-ankaios / ankaios

Eclipse Ankaios provides workload and container orchestration for automotive High Performance Computing (HPC) software.
https://eclipse-ankaios.github.io/ankaios/
Apache License 2.0
60 stars 22 forks source link

Create concept for SDKs #376

Closed GabyUnalaq closed 1 month ago

GabyUnalaq commented 1 month ago

Description

The primary objective of this issue is to design a unified and consistent interface for the SDKs before their actual implementation. The interface should be intuitive, easy to use, and provide similar functionalities, regardless of the language used.

Having this separate issue helps in creating the concept without focusing on a specific language and then, to have the language specific discussions in the relevant issues.

This concept will help:

Goals

A concept is ready and the implementation of the SDKs can be started.

Final result

TBD

Summary

To be filled when the final solution is sketched.

GabyUnalaq commented 1 month ago

Initial raw concept:

# Needed for WorkloadSpec
enum RestartPolicy
    Never
    OnFailure
    Always

# Object for configuring the Workload (similar to the Rust implementation)
object WorkloadSpec:
    # Vars
    instance_name: String
    tags: Array<String>
    dependencies: TODO
    restart_policy: RestartPolicy
    runtime: String
    runtime_config: String
    control_interface_access: TODO

    # Functions
    set_####(value: value_type) -> none
    to_str() -> String

# Main object for the user that he will use
object Ankaios:
    # Vars
    logger: Logger

    # Functions:
    set_logger_level(level: TODO) -> none
    get_state(filter: String) -> JsonObject
    run_workload(self, workload_name: String, workload_spec: WorkloadSpec or HashMap) -> bool
    delete_workload(self, workload_name: String) -> bool
    get_agents(self) -> JsonObject

    # Private functions
    write_to_control_interface()
    read_from_control_interface()
GabyUnalaq commented 1 month ago

Updated concept:

# Object for configuring the Workload (similar to the Rust implementation)
object AnkWorkload:
    # Vars
    workload: ank_base.Workload

    # Functions
    set_agent_name(agent_name: str) -> none
    set_restart_policy(policy: str) -> none
    add_dependency(workload_name: str, condition: str) -> none
    add_tag(key: str, value: str) -> none
    set_runtime(runtime: str) -> none
    set_runtime_config(config: str) -> none
    print_workload() -> none

    # Private functions
    get() -> ank_base.Workload

# Complete state object, for handling complete state configuration
object AnkCompleteState:
    # Vars
    complete_state: ank_base.CompleteState

    # Functions
    set_api_version(version: str) -> none
    add_workload(name: str, workload: AnkWorkload) -> none
    print_complete_state() -> none

    # Private functions
    get() -> ank_base.CompleteState

# Request object (it will create an ID at creation, for the request-response schema)
object AnkRequest:
    # Vars
    request: ank_base.Request
    request_type: str

    # Functions
    get_id() -> str
    set_complete_state(complete_state: AnkCompletestate) -> none
    add_mask(mask: str) -> none # Will add the mask to either updateMask or fieldMask
    print_request() -> none

    # Private functions
    get() -> ank_base.Request

# Response object, that the user will interact with
object AnkResponse:
    # Vars
    response: ank_base.Response
    content: any
    content_type: str

    # Functions
    get_request_id() -> str
    check_request_id(request_id) -> bool
    get_content() -> (str, any)  # get content_type and content as well
    print_response() -> none

enum AnkLogLevel:
    FATAL, ERROR, WARN, INFO, DEBUG

# Main object for the user that he will use
object Ankaios:
    # Vars
    logger: Logger
    path: str  # Control interface path
    read: bool  # Toggles the reading of the CI
    responses: list[AnkResponse]  # Saves all responses got

    # Functions
    set_logger_level(level: AnkLogLevel) -> none
    connect() -> none  # Starts a thread that reads the CI
    disconnect() -> none
    get_responses() -> list[AnkResponse]
    send_request(request: AnkRequest, timeout: int) -> AnkResponse
    get_state(timeout: int, field_mask: list<str>) -> AnkCompleteState
    run_workload(workload_name: str, workload: AnkWorkload, update_mask: list) -> none
    delete_workload(workload_name: str) -> none
    get_agents()  # TBD

    # Private functions
    get_response_by_id(request_id: str, timeout: int) -> AnkResponse
    read_from_control_interface() -> none  # Implements the protobuf reading stuff
    write_to_control_interface(request: AnkRequest) -> none   # Implements the protobuf writing stuff
windsource commented 1 month ago

The SDK shall provide an easy way for developers to extend their workloads with access to the Ankaios control interface. An example for such a workload is a fleet connector workload that on the one hand has access to the cloud and receives Ankaios manifests from there for workloads or configs to add, update or delete. On the other side it has access to the Ankaios control interface. So a developer would need an easy SDK API to pass these manifests to Ankaios without too much overhead.

GabyUnalaq commented 1 month ago

After a discussion with @windsource and @krucod3, we arrived at a concept for the usage of the SDKs. This represents the high-level functions that the user interacts with.

# User should create the Ankaios object
ank = Ankaios()

ank.apply_manifest(manifest)  # Apply a manifest file to the ankaios system
ank.delete_manifest(manifest)  # Delete a manifest file

workload = Workload().config(...).build()
ank.run_workload(workload)  # Used for run and update workload
ank.delete_workload(workload_name: str)  # Delete workload based on the name
workload = ank.get_workload(workload_name: str)

config = {}
ank.set_config(name, config)
config = ank.get_config(name)
ank.delete_config(name)

complete_state = ank.get_state()
agents: list[str] = ank.get_agents()
workload_state: TBD = ank.get_workload_states()

Also, the SDKs should be in separate repositories.

GabyUnalaq commented 1 month ago

Final SDK Concept proposal (done in accordance with the development of the Python SDK):

Workload

The Ankaios SDK should provide a Workload class that let's the user configure and work with ank_base.Workload objects with ease.

Public methods:

WorkloadBuilder

The Ankaios SDK should provide a WorkloadBuilder class that facilitates the creation of a Workload instance.

Public methods:

WorkloadExecutionState

The Ankaios SDK should provide a class that saves the Execution state of a workload. It should contain 2 public fields, one for the state and one for the substate (represented by enumerations: WorkloadState / WorkloadSubState)

WorkloadInstanceName

The Ankaios SDK should provide a class that saves the workload details that are received from the Ankaios system. It should contain 3 public fields, for the workload name, agent name and workload ID.

WorkloadState

The Ankaios SDK should provide a class that represents a workload state, as received from the Ankaios system. It should contain 2 public fields: one for the WorkloadInstanceName and one for the WorkloadExecutionState.

WorkloadStateCollection

The Ankaios SDK should provide a class that lets the user work and use multiple instances of the WorkloadState.

Public methods:

Manifest

The Ankaios SDK should provide a Manifest class that lets the user load a manifest file.

Public methods:

CompleteState

The Ankaios SDK should provide a class that let's the user configure and work with ank_base.CompleteState objects with ease.

Public methods:

Request

The Ankaios SDK should provide a class that let's the user configure and work with ank_base.Request objects with ease. There are 2 types of requests: update_state and get_state.

Public methods:

Response

The Ankaios SDK should provide a class that let's the user configure and work with ank_base.Response objects with ease. There are 3 types of responses: error, completeState and UpdateStateSuccess.

Public methods:

ResponseEvent

The Ankaios SDK (should?) provide a helper class for working with events related to the request / response across the control interface.

Public methods:

Ankaios

The Ankaios SDK should provide a class that facilitates the communication with the control interface and provides an easy to use interface for all the actions the user might want to make.

Public methods:

inf17101 commented 1 month ago

When using the set_ methods will it be updated if the item exists?

GabyUnalaq commented 1 month ago

When using the set_ methods will it be updated if the item exists?

Ankaios.set_config / set_config_from_file - not yet implemented Ankaios.set_logger_level - replaced CompleteState.set_workload - adds the workload to the list of workloads Request.set_complete_state - replace the saved complete state ResponseEvent.set_response - replace, mostly meant for inner usage.

christoph-hamm commented 1 month ago

I am missing a set_state method, taking a state and an update mask. This allows you to do certain things you cannot do with the other methods (e.g. only change a certain tag of a workload, without interfering with other workloads also manipulating the same workload).

Will the get_state command allow to provide and field mask?

GabyUnalaq commented 1 month ago

I am missing a set_state method, taking a state and an update mask. This allows you to do certain things you cannot do with the other methods (e.g. only change a certain tag of a workload, without interfering with other workloads also manipulating the same workload).

I could create a set_state, it's easy enough, but the usecase you provided is already covered: You first call get_workload and you will get a Workload object, you change it's tag, and then you send it back with run_workload.

Will the get_state command allow to provide and field mask?

This is the signature of get_state: get_state(self, timeout: float = DEFAULT_TIMEOUT, field_mask: list[str] = None) -> CompleteState:

GabyUnalaq commented 1 month ago

This concept is good enough for the moment. The Python SDK follows the concept and the Rust SDK will do the same. In case there is need, another issue can be created to modify the concept and/or the sdk. This issue can be closed now.