offline data: tasks & jobs in the n-x & n-cycle windows

oliver-sanders commented 2 years ago

Currently the UI can access the "live" data streamed directly from the workflow. The UI needs to be able to access the "offline" data, e.g. information about older tasks/jobs no longer in the n=1 window.

The UI needs to be able to access the offline data e.g. for:

Displaying historical tasks/jobs (cylc review style) in all views.
Workflow analysis.

Assuming that the client is already subscribed to the relevant data, this only needs to be a one-off query since any subsequent alterations to this data will be covered by the existing subscription. However, for technical reasons we may need to implement this as a subscription to help the UI to know when the data can be removed from the store.

The current thinking is that "live" data will be controlled by the n=x window (i.e. n=0, n=1, ...) and the "offline" data will be controlled by the n=cycle window (i.e. n=20000101, n=20000102, ...).

To implement this we need to add resolvers which handle the n=cycle window by scraping the required info from the public database. Note the public database can get locked so we would need retries with sleeps between them.

Pull requests welcome!

hjoliver commented 2 years ago

The current thinking is that "live" data will be controlled by the n=x window (i.e. n=0, n=1, ...) and the "offline" data will be controlled by the n=cycle window (i.e. n=20000101, n=20000102, ...).

Not sure this should be the default. For big worklfows it will massively increase the number of objects in the DOM.

oliver-sanders commented 2 years ago

I'm not sure what you mean by that?

The n=1 window is the "default". If the operator would like to see historical data, they will have to request it, per cycle. The UIS/UI should only hold the data which the user has requested at all times.

If the operator would like to see one cycle worth of tasks, this will increase the number of objects in the DOM, but this is unavoidable.

hjoliver commented 2 years ago

I meant, perhaps historical data should not be full-cycle based by default when applied to our "live" workflow views (graph, tree, etc.), because that may overwhelm the browser.

Other options, off the top of my head:

use a high n value?
n-window around a requested task (it can sometimes be useful to see the logs of preceding tasks, when diagnosing problems)?
by default, filter out succeeded tasks?

But also:

retain a full-blown cylc review style log view, for when you really want quick access to ALL of the job logs at once. (I think cylc review is quite good, and shouldn't necessarily be ditched in favour of integration with the live workflow views)

oliver-sanders commented 2 years ago

Yes cylc review is quite good which is why I think it should be integrated into the new UI by the addition of this data interface (n=cycle previously discussed right?).

The ability to view offline data in the GUI must be one of the most asked for features in Cylc, having to switch between two orthogonal interfaces for live OR live+historical data is pretty pointless.

I think for the historical data to be useful it would have to be per-cycle which is the natural pagination system for Cylc. The browser display will still be restricted by inheritance (tree view) or pagination (table view). We can provide additional visualisation grouping e.g. for parametrisation to further throttle this. Given that the tree view can happily handle multiple cycles worth of nasty workflows (e.g. the complex workflow) this should be within capabilities. If we are worried node limits or grouping controls (tree-level pagination) could be implemented to prevent the most complex workflows from overloading the browser, however, I expect most workflows / real-world uses would never approach this limit.

Browser limits need investigation, but at present I this seems perfectly possible?

hjoliver commented 2 years ago

(n=cycle previously discussed right?).

Yes, but I'm not sure we decided it would be the default even for historical data. And even if we did, we'd have to revisit that if performance was a problem. (Note, I'm concerned about specific views like tree and graph - it's clearly not a problem for paginated views a la table or the old cylc review).

The ability to view offline data in the GUI must be one of the most asked for features in Cylc, having to switch between two orthogonal interfaces for live OR live+historical data is pretty pointless.

I'm not suggesting that log view should not be integrated into all workflow views, or that we should not provide n=cycle as an option, just querying it as the default if that would be a problem for large workflows.

Worst case scenario, if performance does prove to be an issue, might be n-window with integrated log view, plus n=cycle as an option (to be used with caution), plus an additional separate cylc review style view if you need to do serious log browsing.

Given that the tree view can happily handle multiple cycles worth of nasty workflows (e.g. the complex workflow) this should be within capabilities. ... Browser limits need investigation, but at present I this seems perfectly possible?

But that's only since spawn-on-demand made the n-window restriction feasible/natural, right? Which is typically a massive reduction in tasks to display. Prior to that the tree view was struggling, hence our aborted experiments with the "infinite tree" model.

oliver-sanders commented 2 years ago

default even for historical data Note, I'm concerned about specific views like tree and graph

Not quite sure what you mean by "default" here. Each view should filter its portion of the data store individually e.g you should be able to open a tree view n=2 and a graph view n=1 (or even n=x around a specified task).

So requesting one cycle of historical data wouldn't mean that every view would show this "by default". The ability to display historical data would need to be enabled on a view-by-view basis, we probably wouldn't add this to the graph view.

hjoliver commented 2 years ago

I just mean, if I have the tree view open (say) and I want to see the logs of a historic task, should that automatically populate the tree with whole cycles so that I can filter for the task and get at the logs that way?

That is going to cause problems for any non-paginated view, for sufficiently large workflows. There might be better ways that would even work for the graph view, and which should be used first unless the user really wants whole cycles.

E.g. off the top of my head:

expand the n-window to capture tasks of interest (fine sometimes)
recenter the n-window (or add another center) around a task of interest, then view logs from there
click on "view logs" for any task, and change the task ID inside the task log view
(last resort: if the user asks for n=cycle and the browser grinds to a halt, argue that it's their own fault :-)

And finally, there are times when having the entire run history at your fingertips is useful, e.g. I want to filter for every failed instance of a task ever. So it still seems to me there is a good case for a "history view" that essentially replicates cylc review in addition to (not instead of!) whatever kind of history integration we end up implementing in the tree and graph etc.

At least it's not obvious to me how that sort of thing would be effective in (say) the tree view - but maybe you'll set me straight on that!

hjoliver commented 2 years ago

So requesting one cycle of historical data wouldn't mean that every view would show this "by default". The ability to display historical data would need to be enabled on a view-by-view basis, we probably wouldn't add this to the graph view.

That would help (clearly the graph view is a no-go for n=cycle) but I imagine we wouldn't want to disable it for the tree view.

oliver-sanders commented 2 years ago

expand the n-window to capture tasks of interest (fine sometimes) recenter the n-window (or add another center) around a task of interest, then view logs from there

I think for most use cases this is insufficient.

click on "view logs" for any task, and change the task ID inside the task log view

Fine and we should support this, however, there are use cases besides viewing job logs.

(last resort: if the user asks for n=cycle and the browser grinds to a halt, argue that it's their own fault :-)

Or implement a node limit (e.g. "can't load: too many nodes, try filtering").
Or provide better tree-level pagination (e.g. in the extreme we could create a familes called A-C, D-F, ...).

oliver-sanders commented 2 years ago

We have a need to get task & job information into the schema for one-off queries ASAP. We need:

Submitted time.
Started time.
Finished time.
Job status.

Ideally from the same query that we would use to obtain [runtime] fields. This should be a small job, basically a light-weight wrapper around the public-DB to pull out the required fields.

Suggesting that we add a simple GraphQL field to provide this for now. We can then retire this in favour of a more advanced solution integrating the n-window and the like at a later date when we have more time to think through the interfaces.

cylc / cylc-uiserver

offline data: tasks & jobs in the n-x & n-cycle windows #378