flux-framework / flux-docs

Documentation for the Flux-Framework
https://flux-framework.readthedocs.io/
Other
12 stars 21 forks source link

Adding HTCondor to the list resource managers #219

Open mtwest2718 opened 1 year ago

mtwest2718 commented 1 year ago

Hi All,

Someone on CNCF Slack channel pointed me at this page and I was disappointed to see HTCondor not included. Can I make a request it be added.

Also, are there a bit more detail on each of the categories, so I could suggest how to fill in each?

Cheers, Matt

vsoch commented 1 year ago

Yes of course! We would love to have that contribution. The table was added recently but is a bit old (it was previously in a PDF that we dug up) and it's likely just an oversight that it's not there.

Also, are there a bit more detail on each of the categories, so I could suggest how to fill in each?

Let us know which categories you would like clarification on and we can do our best! And for some that are a bit opaque we can definitely add a note to that page.

mtwest2718 commented 1 year ago

Let us know which categories you would like clarification on and we can do our best! And for some that are a bit opaque we can definitely add a note to that page.

I am just starting with the multi-user mode piece by piece and TBH, I am having a hard time parsing any of the terms. I can guess what you mean but also have suspicions that the meanings may be very specific.

vsoch commented 1 year ago

If you have specific questions please post them here and we would be happy to clarify any points.

mtwest2718 commented 1 year ago
  1. Multi-user workload management
    • As in more than one user can submit work to a system and the scheduler will allocate resources accordingly?
  2. Full hierarchical resource management
    • ???
  3. Graph-based advanced resource management
    • Are we talking in terms of workflow parent-child dependencies?
  4. Scheduling specialization
    • As in SysAdmins can adjust the weights or algorithm used for resource allocation?
  5. Security: only a small isolated layer running in privileged mode for tighter security
    • Could you clarify this and how you made this assessment? I would imagine most projects would dispute this verdict as unfair and/or incorrect.
  6. Modern command-line interface (cli) design
    • What is considered modern?
  7. Application programming interface (APIs) for job management, job monitoring, resource monitoring, low-level messaging
    • Would it be could to break these out into four different categories?
    • Also, because my low-level system skills are lacking, what do you mean by the last of the 4? Just so I can go check documentation.
  8. Language bindings
    • Why isn't bindings beyond C/C++ sufficient for green?
  9. Bulk job submission
    • Like Job arrays or something else?
  10. High-speed streaming job submission
    • Can you please define what this means.
vsoch commented 1 year ago

Pinging @grondo and @garlick but I'll do my best to give these a first shot.

Multi-user workload management

Multi-user vs. single-user is exactly what it sounds like - akin to Nix Flux can be installed to serve an entire cluster of users, OR it can be run and controlled by one user, in say, a Docker container. On a multi-user instance a single user can also spin up a flux instance that they own entirely. HTCondor is definitely multi-user, and I am not sure about single.

Full hierarchical resource management

This means the scheduler understands its resources as a graph from the top level node down to a core or socket - this is a no for HTCondor.

Graph-based advanced resource management

It's more than workflow parent-child dependencies - this video gives a good visual: https://youtu.be/YIwt51dyXOE and flux-sched https://github.com/flux-framework/flux-sched. This is probably a no for HTCondor but others can chime in.

Scheduling specialization

I'm not totally sure on this one - I'll ask my colleagues! But I think this generally means you can customize policies and the algorithm, e.g.,:

sched-fluxion-qmanager, which manages one or more prioritized job queues with configurable queuing policies (fcfs, easy, conservative, or hybrid). sched-fluxion-resource, which matches resource requests to available resources using Fluxion's graph-based matching algorithm.

Security: only a small isolated layer running in privileged mode for tighter security

https://flux-framework.readthedocs.io/en/latest/guides/admin-guide.html?h=security#security

And I'll refer to my colleagues.

I would imagine most projects would dispute this.

Why?

Modern command-line interface (cli) design

We have a design that is more similar to what you might see for a Go / Python / Rust command line clients, e.g.,:

$ flux <options> <subcommand>

E.g., flux submit or flux resource list. This is in comparison to, for example, slurm that has single / separate binaries for each command (srun squeue etc).

Application programming interface (APIs) for job management, job monitoring, resource monitoring, low-level messaging

We could break into categories, but for now they are grouped. I think low level messaging is referring to https://flux-framework.readthedocs.io/projects/flux-rfc/en/latest/spec_3.html.

Language bindings Why isn't bindings beyond C/C++ sufficient for green?

Sorry, there is more then C/C++, the list has:

C, C++, Python, Lua, Rust, Julia, REST (and we also have Go under development)

Bulk job submission

This means submitting jobs in bulk.

High-speed streaming job submission

I know this means what it says - submitting thousands (millions?) of jobs quickly - I'm not sure about how it's implemented.