cylc / cylc-uiserver

A Jupyter Server extension that serves the cylc-ui web application for monitoring and controlling Cylc workflows.
https://cylc.org
GNU General Public License v3.0
15 stars 18 forks source link

schema: add abstracted resource requested #617

Open oliver-sanders opened 4 months ago

oliver-sanders commented 4 months ago

The analysis view in the GUI would like access to the memory, CPUs and time that a job requested when it was submitted.

This information is presently available via the schema, however, it is in the form of batch-system specific directives. The analysis view requires simple abstract values for these fields which must be parsed out of the directives.

So we require a bit of code that can convert batch-system specific directives into abstract memory, CPUs and time numbers that the analysis view can use. We could do this in the UI, however, it might be cleaner to do this in the UIS, especially as some batch systems may be configured with default values for some directives which we might need to query or configure which is much easier to do server-side.

Here's an untested diff showing how we might go about adding this information into the UIS data:

diff --git a/cylc/uiserver/schema.py b/cylc/uiserver/schema.py
index 99933a7..5bc95f5 100644
--- a/cylc/uiserver/schema.py
+++ b/cylc/uiserver/schema.py
@@ -20,7 +20,9 @@ extra functionality specific to the UIS.
 """

 from functools import partial
-from typing import TYPE_CHECKING, Any, List, Optional
+import json
+import math
+from typing import TYPE_CHECKING, Any, List, Optional, NamedTuple

 import graphene
 from graphene.types.generic import GenericScalar
@@ -309,7 +311,54 @@ async def get_elements(root, info, **kwargs):
     kwargs['exworkflows'] = [
         Tokens(w_id) for w_id in kwargs['exworkflows']]

-    return await list_elements(kwargs)
+    ret = []
+
+    # if resource fields requested
+    ret.append(get_job_resource_request(info, kwargs))
+
+    # if database fields requested
+    ret.append(await list_elements(kwargs))
+
+    # merge results if both sets of fields were requested
+    return merge_results(ret)
+
+
+def get_job_resource_request(info, args):
+    """Return the resource request for a task as configured.
+
+    TODO:
+    * Determine the batch system by mapping the task onto the specified
+      platform.
+    * We will likely want the resource requested by a task when it was
+      submitted (which may be altered by reloads or broadcasts) rather than the
+      resource that is configured in the workflow.
+
+    """
+    ret = {}
+
+    for workflow_id in args['workflows']:
+        for namespace in args['tasks']:
+            namespace_id = (
+                Tokens(workflow_id).duplicate(namespace=namespace).id
+            )
+
+            namespace = info.context['resolvers'].data_store_mgr.data[
+                workflow_id
+            ][TASKS][namespace_id]
+
+            directives = {
+                item['key']: item['value']
+                for item in json.loads(namespace.runtime.directives)
+            }
+
+            memory = int(directives.get('--mem', 0))
+            cpus = math.ceil(int(directives.get('--tasks', 0)) / 2)
+            time = float(directives.get('--time', 0))
+
+            # TODO: put the results into the format GraphQL is expecting
+            ret[...] = {'memory': memory, 'cpus': cpus, 'time': time}
+
+    return ret

 async def list_elements(args):

Note info.context['resolvers'].data_store_mgr.data is the Protobuf data store. All the information we require is in here.

Suggest developing the directive abstraction code something along the lines of:

def get_requested_resource(batch_system: str, directives: Dict[str, str]) -> Optional[int]:

>>> get_requested_resource('slurm', {'--mem': '2Gn'})  # is Gn a base 2 or metric value, dunno
{'memory': 2000000}

>>> get_requested_resource('slurm', {})
None

It may be worth investigating whether anyone else has written directive abstraction code, you never know.

We do not need to support all of the batch systems that Cylc supports here, just make sure it doesn't crash in a heap if provided with something it doesn't recognise.