eclipse-ankaios / ankaios

Eclipse Ankaios provides workload and container orchestration for automotive High Performance Computing (HPC) software.
https://eclipse-ankaios.github.io/ankaios/
Apache License 2.0
62 stars 23 forks source link

workload names containing a dot fail when calling ank apply #341

Closed sboett-dev closed 1 month ago

sboett-dev commented 3 months ago

Workload names containing a dot work in the Ankaios init state file. But when calling ank apply file.yml, the ank CLI crashes.

Current Behavior

ank --insecure apply test.yml

thread 'main' panicked at ank/src/cli_commands/apply_manifests.rs:99:60:
called `Option::unwrap()` on a `None` value
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

With backtrace: RUST_BACKTRACE=full ank --insecure apply test.yml

thread 'main' panicked at ank/src/cli_commands/apply_manifests.rs:99:60:
called `Option::unwrap()` on a `None` value
stack backtrace:
   0:           0x7d929c - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::hefa070d19712f7b2
   1:           0x818638 - core::fmt::write::hb4781d2f656efdf8
   2:           0x7d60b4 - std::io::Write::write_fmt::ha5c18ee5184597ab
   3:           0x7d90a0 - std::sys_common::backtrace::print::h3107d15dfd66456e
   4:           0x7da40c - std::panicking::default_hook::{{closure}}::h9dbd4e1e1cf8eefd
   5:           0x7da0cc - std::panicking::default_hook::h3fc1e403ea888d0b
   6:           0x7da8ec - std::panicking::rust_panic_with_hook::h04fd07d2f7d1a3b9
   7:           0x7da6c4 - std::panicking::begin_panic_handler::{{closure}}::h6333e84bcadb0df3
   8:           0x7d9780 - std::sys_common::backtrace::__rust_end_short_backtrace::h3f77d7423d385ad2
   9:           0x7da464 - rust_begin_unwind
  10:           0x41e0c8 - core::panicking::panic_fmt::h44130785d025a2df
  11:           0x41e148 - core::panicking::panic::h462d608a37dc0f39
  12:           0x41e080 - core::option::unwrap_failed::hf848e62c66d46e54
  13:           0x49591c - ank::cli_commands::apply_manifests::generate_state_obj_and_filter_masks_from_manifests::h830cca5b098c9392
  14:           0x47c880 - ank::main::{{closure}}::h23d12342ef360f8a
  15:           0x4789e4 - tokio::runtime::park::CachedParkThread::block_on::hfe23bfd57f5a7278
  16:           0x486230 - ank::main::h3f7f505734ea8aab
  17:           0x47f948 - std::sys_common::backtrace::__rust_begin_short_backtrace::he19d4207b96c7321
  18:           0x4534ec - std::rt::lang_start::{{closure}}::h460ea328d12dfd89
  19:           0x7ce898 - std::rt::lang_start_internal::hb34ff58a601949c8
  20:           0x486e34 - main

Expected Behavior

The workload is applied.

Steps to Reproduce

Create the test.yml file:

apiVersion: v0.1
workloads:
  nginx.test:
    runtime: podman
    agent: agent_A
    restartPolicy: ALWAYS
    tags:
      - key: owner
        value: Ankaios team
    runtimeConfig: |
      image: docker.io/nginx:latest
      commandOptions: ["-p", "8081:80"]

Then call ank --insecure apply test.yml

Context (Environment)

EWAOL, arm64, AWS EC2 t4g.large systemd 250 (250.5+) podman version 4.3.1-dev Ankaios 0.3.1 and 0.4.0 (earlier versions not tested)

Logs

see above.

Additional Information

In the init state file read by the Ankaios server on startup, dots in workload names do work.

Final result

To be filled by the one closing the issue.

christoph-hamm commented 3 months ago

Hi, we indent to only allow workload names matching [a-zA-Z0-9][a-zA-Z0-9_.-]*. Unfortunately this is currently not documented and not enforced on the startup state.

We cannot allow "." in workload names, as we use it as separator in the filter-, field- and update-mask and also in the workload instance name.

What we should do now is to introduce a new data structure WorkloadName, which enforces the correct naming (also when deserialzing from yaml). This new data structure can then be used in the intern data structures common::objects::State and common::objects::WorkloadSpec. In this way it will also fail for the startup state and the error message for the ank command can be improved.

sboett-dev commented 3 months ago

Thanks for the clarification! Enforcing it would make sense as Ankaios and the ank CLI should treat YAML files exactly the same.

krucod3 commented 2 months ago

@windsource proposed to also restrict the length of the names.

We could also make a suggestion in the documentation for a separator, i.e.: "_", "-"

HorjuRares commented 2 months ago

I am currently working on this.

krucod3 commented 2 months ago

As we are already adding some constraints on the workload names here, we can also limit the length of the names to avoid problems later. The name length is currently limited by the max length of a container name so a max_workload_name_length = max_podman_container_name - 2 (for the two dots) - max_agent_name - max_id_legth. The calculation above shows that we shall limit the agent name too.

To get an initial impression, we can start here by finding out what the limits of other possible runtimes are, e.g., docker, crun, containerd, system processes (native workloads), northstar, or how other orchestrators handle this.

krucod3 commented 1 month ago

All done here. The changes will be released with v0.5.0.