cylc / cylc-flow

Cylc: a workflow engine for cycling systems.
https://cylc.github.io
GNU General Public License v3.0
327 stars 93 forks source link

GraphQL: updating arrays with delta updates #6307

Open oliver-sanders opened 4 weeks ago

oliver-sanders commented 4 weeks ago

Data stores maintained by GraphQL updates (i.e. cylc-ui) are not presently able to synchronise some array fields.

This has been discussed before (can't find the issue dammit). We have previously worked around the issue by repeating all elements of updated arrays in the returned deltas.

Example: Outputs

This subscription tracks task outputs:

subscription {
  deltas {
    updated {
      workflow {
        taskProxies {
          id
          outputs {
            label
            satisfied
          }
        }
      }
    }
  }
}

The corresponding added subscription, would initially yield a list of all outputs, i.e:

[
    {'label': 'submitted', 'satisfied': false},
    {'label': 'started', 'satisfied': false},
    {'label': 'succeeded', 'satisfied': false},
    {'label': 'failed', 'satisfied': false},
    {'label': 'submit-failed', 'satisfied': false},
    {'label': 'expired', 'satisfied': false}
]

Whenever there is a change, e.g. when the first job submits, we will receive only the element that was updated e.g:

[
    {'label': 'submitted', 'satisfied': true},
]

And so on:

[
    {'label': 'started', 'satisfied': true},
]
[
    {'label': 'succeeded', 'satisfied': true},
]

This makes sense, we are pushing only the minimal information down the wire.

But how is the remote data store supposed to work with these deltas?

The remote data store needs to be able to associate the updated element back to the corresponding element in the store in order to update it.

In this case it's easy to do by inspection, the label is the unique key representing the object. The remote data store should look through the outputs until it finds one with the same label, and update the satisfied field to match the update. Although this would require the remote data store to know that label is the unique key for outputs arrays which makes updates complicated.

Example: Prerequisites

However, it's not always so easy. Take the prerequisites field for example:

subscription {
  deltas {
    updated {
      workflow {
        taskProxies {
          id
          prerequisites {
            satisfied
            expression
            conditions {
              taskId
              message
              exprAlias
              satisfied
            }
          }
        }
      }
    }
  }
}

There is no unique key here. The expression field (which has values like c0 & c1 or c0 | c1) is not unique. The conditions array isn't unique either since the same conditions could be arranged into different prerequisite expressions.

The only way we can get a unique key is to form a composite of the following fields:

expression
conditions {
    taskId
    message
    exprAlias
}

Problem: Cylc UI

At present, the Cylc UI data store uses Object.assign(data_store, update) to update the data. This will overwrite each key in data_store with the value in update.

The result is that the full list of task outputs that we received in the added delta, will be completely overwritten by the single output we received in the updated delta.

Solution 1: Repeat all array elements in delta updates

One option would be to just repeat all elements in the update.

This isn't ideal efficiency wise, but it is a quick & dirty fix that would allow us to overcome this hurdle quickly without getting too involved in protocols and update functions.

I.E, rather than returning only the updated item in a delta update:

[
    {'label': 'submitted', 'satisfied': true},
]

We would return the whole array:

[
    {'label': 'submitted', 'satisfied': true},  // the updated element
    {'label': 'started', 'satisfied': false},
    {'label': 'succeeded', 'satisfied': false},
    {'label': 'failed', 'satisfied': false},
    {'label': 'submit-failed', 'satisfied': false},
    {'label': 'expired', 'satisfied': false}
]

Whenever there is a change, e.g. when the first job submits, we will receive only the element that was updated e.g:

Solution 2: Replace arrays with objects

We could replace arrays with Objects where the array index is used as the key.

For example, rather than doing this:

[
    {'label': 'submitted', 'satisfied': false},
    {'label': 'started', 'satisfied': false},
    {'label': 'succeeded', 'satisfied': false},
    ...
]

We would do this:

{
    0: {'label': 'submitted', 'satisfied': false},
    1: {'label': 'started', 'satisfied': false},
    2: {'label': 'succeeded', 'satisfied': false},
    ...
}

The downside to this is that it would create a backwards incompatible change. The only alternative to breaking compatibility would be to create a duplicate field for the new object.

Another downside is that there would be no obvious mechanism for removing elements. We can add them, we can update them, but the only way to remove them is to prune the containing object and (re)send and added delta (essentially destroy and rebuild that part of the store).

Solution 3: Develop an array update protocol

The most sophisticated solution would be to enable to UI to update these arrays intelligently by providing a standard "key" for the update algorithm to use for mapping the update back to the original object in the data store.

E.G. we could standardise on using a field called id:

[  // added delta
    {'id': 'submitted', 'label': 'submitted', 'satisfied': false},
    {'id': 'submitted', 'label': 'started', 'satisfied': false},
    {'id': 'submitted', 'label': 'succeeded', 'satisfied': false},
    ...
]

So that updates can be easily mapped onto data store elements:

[  // updated delta
    {'id': 'submitted', 'satisfied': true},
    ...
]

This protocol would need to include a mechanism for "pruning" an element, e.g:

[  // updated delta
    {'id': 'submitted', '_cylc_remove': true}  // special remove flag
    ...
]
oliver-sanders commented 4 weeks ago

Sadly this is impacting the Info view (https://github.com/cylc/cylc-ui/pull/1886) which requires both prerequisites and outputs. It will also want to use several other fields in due course.

I'm tempted to lead towards solution (1) at the moment. It's quick and dirty, but it should be enough to get us moving again. We could chalk up solution (3) as a possible efficiency enhancement in the future.

Whatever solution we pick, we need to apply it globally rather than waiting for bug reports and changing the fields one at a time.

MetRonnie commented 2 weeks ago

I've tested on 8.3.x and master, and in the networks panel of devtools I see all the prerequisites listed in every update, even if only one of them changed