bacalhau-project / bacalhau

Compute over Data framework for public, transparent, and optionally verifiable computation
https://docs.bacalhau.org
Apache License 2.0
675 stars 88 forks source link

Separate Job spec from system state #3983

Closed frrist closed 4 months ago

frrist commented 4 months ago

At present the system representation of a job and user submitted data are combined together into a single structure, the Job type.

Problem

It feels like we are making this easy for ourselves, instead of ours users. Or what's more likely - making this hard for BOTH ourselves and users. At present we are munging together user submitted data and system data into multiple structures - the job and its tasks. Further we are making modifications to the data submitted by users depending on various edge cases. Lastly, we are inefficiently storing job state when persisting to disk in the jobstore.

Ways we are making it harder for ourselves:

Ways we are making it hard for our users

Proposal

Separate the system representation of a job and the data expected from users into two different types. As an example, the JobState is used by bacalhau to monitor and track various system attributes of a job. The JobSpec is the schema/type users are expected to submit.

type JobState struct {
    // ID is a unique identifier assigned to this job.
    // It helps to distinguish jobs with the same name after they have been deleted and re-created.
    // The ID is generated by the server and should not be set directly by the client.
    ID string `json:"ID"`

    // SystemMeta is used to associate arbitrary metadata with this job.
    SystemMeta map[string]string `json:"Meta"`

    // State is the current state of the job.
    State State[JobStateType] `json:"State"`

    // Version is a per-job monotonically increasing version number that is incremented
    // on each job specification update.
    Version uint64 `json:"Version"`

    // Revision is a per-job monotonically increasing revision number that is incremented
    // on each update to the job's state or specification
    Revision uint64 `json:"Revision"`

    CreateTime int64 `json:"CreateTime"`
    ModifyTime int64 `json:"ModifyTime"`

    // Spec is the normalized submission provided by the user. It may contain transformations made to the
    // original Source submitted by the user.
    Spec JobSpec

    // Source is the original submission provided by the user. It is json bytes
    Source []byte
}

type JobSpec struct {
    // Name is the logical name of the job used to refer to it.
    // Submitting a job with the same name as an existing job will result in an
    // update to the existing job.
    Name string `json:"Name"`

    // Namespace is the namespace this job is running in.
    Namespace string `json:"Namespace"`

    // Type is the type of job this is, e.g. "daemon" or "batch".
    Type string `json:"Type"`

    // Priority defines the scheduling priority of this job.
    Priority int `json:"Priority"`

    // Count is the number of replicas that should be scheduled.
    Count int `json:"Count"`

    // Constraints is a selector which must be true for the compute node to run this job.
    Constraints []*LabelSelectorRequirement `json:"Constraints"`

    // Meta is used to associate arbitrary metadata with this job.
    Meta map[string]string `json:"Meta"`

    // Labels is used to associate arbitrary labels with this job, which can be used
    // for filtering.
    Labels map[string]string `json:"Labels"`

    Tasks []*Task `json:"Tasks"`
}
wdbaruni commented 4 months ago

Thanks Forrest for your deep dive. Though I am not in agreement with most of the issues and concerns you've raised

First, there is a precedence to what we are doing and how we are handling and filtering out system defined fields from the job submissions. Nomad's job spec is very close to what we are doing where they interleave system defined fields in the job spec. Similarly, Kubernetes allow users to define metadata and spec fields when submitting a job, but also return status field when querying about the job and they do add their own metadata and it is a standard practice for CRDs to inject additional metadata in the same top level metadata field. While not recommended, users can submit jobs with status field, and kubernetes will just ignore them.

Second, we need to think about our API design. Nomad, Kubernets and our V2 APIs are restful APIs with a unified endpoint for interacting with jobs/pods, where users can submit jobs using PUT method and query using GET. Based on your proposal, there will be separate endpoints to submit and others to query

As for your specific concerns:

There are several different methods on the job type that must be called to validate and sanitize it. Some of these methods must only be called on the client side, while others must only be called on the server side. This is poor DevEx.

I don't understand why this is poor DevEx? Again this is what both Kubernetes and Nomad do, and I don't see the problem. Besides, we allow the requester to set default fields if they are missing in the job, and we will benefit from different validation at the submission layer where things can be more relaxed, and later validation after populating defaults.

The meta field of a job has various "reserved" keys that cannot be set by users, if they are we delete them: https://github.com/bacalhau-project/bacalhau/blob/main/pkg/models/job.go#L307. This is both poor DevEx and UX.

Again there is a precedence in Kubernetes with this, and it is the main way for CRDs to inject their own state and metadata. I would prefer this than having multiple places to define metadata for the same job

We only disallow user submissions with the JobID set if the job type is batch or ops. system and daemon jobs may provide their own jobID on submission: https://github.com/bacalhau-project/bacalhau/blob/main/pkg/models/job.go#L295-L300.

Yeah we can remove this logic as it can be confusing. The original intention was to allow users to provide their own job names and be able to update long running jobs, but not update batch and ops jobs. Job updates is a bigger discussion and we have other issues to track this including https://github.com/bacalhau-project/bacalhau/issues/3871 and https://github.com/bacalhau-project/bacalhau/issues/3864. Most likely we will allow users to update batch and ops jobs as well.

We inefficiency store the entire job structure, and its tasks(!!), repeatedly to disk when only modifying a small portion of the structure: https://github.com/bacalhau-project/bacalhau/blob/main/pkg/jobstore/boltdb/store.go#L848-L852. The State of a job doesn't need to be in the same structure as the users submission. It would be significantly more efficient to store the user submission once and make updates to system state representation separately.

You need to understand the access pattern of our jobs and APIs before concluding if we are doing things inefficiently or not. We almost never retrieve the job spec without its state. Previously when we had job spec and state as separate objects, we always had to do two queries to the database to fetch both. It is true that we are doing heavier writes, but our access pattern is heavier on the read side. Still, we can benefit from a model that is closer to Kubernetes where we have top level fields for metadata, spec and state. This allows job store implementations to optimize their storage for partial updates.

Whenever the time comes to make changes to the system state representation of the job or task (e.g. add a new field, remove a field, modify a fields type, etc.) we will need to modify everything from the client to the server to accommodate it. Ohh and let's not forget the various SDKs!

I don't understand this. If we add a new system state representation, we will need to update the SDKs regardless to be able to display this field. This is regardless if those system fields are interleaved in the job model, or if they are under a top level state field

Running bacalhau job describe may return a job spec that is different from what the users actually submitted. For example, we modify the count field depending on the job type a user submitted: https://github.com/bacalhau-project/bacalhau/blob/main/pkg/models/job.go#L166

We allow requester nodes to set job defaults if they are missing, such as timeout, job resources and publisher. I don't see this is a problem. I am more concerned in understanding how the system handled my job and what fields it assumed when running the job, rather than making sure bacalhau describe to return exactly what I submitted. Also, Kubernetes and Nomad do inject system information in the submitted job that you will see when you try to get the job

Running bacalhau job describe --output yaml | bacalhau job run and getting several warning messages back about fields being ignored is confusing. As a user I would expect this to "just work" with out warning.

First, Kubernetes ignore those fields without a warning. We can remove the warnings in the future if they are painful. Second, this model of re-submitting jobs may not be useful anyways and not common. Third, users are different and we need to collect more feedback from users before making drastic changes to our APIs. Saying that As a user I would expect this to "just work" with out warning. is a big assumption

Users must carry the cognitive overhead of knowing which fields they can set and which they can't when submitting jobs. This is called out plainly in our documentation: https://expanso.gitbook.io/bacalhau-docs/setting-up/jobs/job#server-generated-parameters.

I think you are over complicating this. Having reserved prefixes for metadata doesn't sound like something painful for users to work with, specially as we are following a pattern where we prefix all those reserved metadata with bacalhau.org/

wdbaruni commented 4 months ago

My overall feedback is that I am sure our APIs will go through another iteration at some point, but lets do that after we have more users and collect more feedback to make informed decisions. The current APIs are functional, have precedence in Nomad, and I don't see will cause us problems for at least 2 years.

We have already opted for fixing a lot of technical debt this quarter, and we need to move forward and deliver new features rather than getting stuck in this phase. When we have more capacity and more users, we will not only explore the structure of our API models, but also the need for separate APIs and CLIs for the different job types as we've discussed before.

frrist commented 4 months ago

Closing due to the controversial nature of this change in discussions here and synchronously.