Proposing change in design : uses `class` instead of dictionaries

khsrali commented 1 month ago

After trying out the first syllabus of documentation (two issues #176 #179), I would like to propose a more intuitive interface:

The proposal:

wg = WorkGraph()
task_1 = wg.add_task(add, x= 1, y= 2,  label= "Note: I don't want to access variables with wg.tasks[<label>]!")
task_2 = wg.add_task(multiply, y=-1, x=task_1.future)
wg.run() # see the bottom of this post for multiple runs

As it is now:

# Create a workgraph to link the tasks.
wg = WorkGraph("test_add_multiply")
wg.add_task(add, name="add1")
wg.add_task(multiply, name="multiply1")
wg.add_link(wg.tasks["add1"].outputs["result"], wg.tasks["multiply1"].inputs["b"])
wg.run(inputs = {"add1": {"x": 1, "y": 2}, "multiply1": {"y": -1}}, wait=True)

Proof of concept: Can be achieved with the following design

import inspect

class Future_pointer:
    def __init__(self, result):
        self.result = result

class Task:

    def __init__(self, func, label=None, depends_on=None, **kwargs):
        self.func = func

        for key, value in kwargs.items():
            setattr(self, key, value)

        params = inspect.signature(func).parameters
        for name in params:
            if not hasattr(self, name):
                setattr(self, name, None)

        self.future = Future_pointer(None)
        self._result = None
        self.dependencies = []
        if depends_on is not None:
            self.add_dependency(depends_on)

    @property
    def result(self):
        return self.future.result

    def calculate(self):
        for dep in self.dependencies:
            dep.calculate()
        args = {}
        for name in inspect.signature(self.func).parameters:
            if isinstance(getattr(self, name), Future_pointer):
                value = getattr(self, name).result
            else:
                value = getattr(self, name)
            if value is None:
                self.future.result = None
                return
            args[name] = value 

        self.future.result = self.func(**args)

    def add_dependency(self, source):
        self.dependencies.append(source)

class WorkGraph:

    def __init__(self):
        self.tasks = []
        self.links = [] # one can keep track of links from input values... a bit more coding but straight forward

    def add_task(self, func, label=None, depends_on=None, **kwargs):
        if not depends_on:
            depends_on = None if not self.tasks else self.tasks[-1]

        instance = Task(func, label, depends_on=depends_on, **kwargs)
        self.tasks.append(instance)

        return instance

    def run(self, **kwargs):
        for task in self.tasks:
            task.calculate()

def add(x, y):
    return x + y

def multiply(x, y):
    return x * y

wg = WorkGraph()
task_1 = wg.add_task(add, x=1, y=2, label="My arbitrary label which could be very long, and I don't want to access with wg.tasks[<name>]!")
task_2 = wg.add_task(multiply, y=-1, x=task_1.future)

# task_2.calculate()
wg.run()
print(task_2.result)  # should print -3 
print(task_1.result)  # should print 3 

task_1.x = 2
wg.run()
print(task_2.result)  # should print -4

I understand this might be considered a pain for @superstar54 but to me this is intuitive and lubricates life of a user.

I would like to know other's opinion as well, @giovannipizzi

giovannipizzi commented 1 month ago

I also suggested to have a tab-completable way to specify links without dicts and string keys, so I'd support the idea, even if I'm not sure of the exact suggestion with the word "future", for instance.

Also, I didn't check the implementation in detail, but I'm really confused by the class containing classes, and then you access a class attribute of the inner class... Why can't you simply use a method of the outer class and store info in an instance attribute, with less python magic? (maybe easier to discuss live, I just wanted to write this comment)

giovannipizzi commented 1 month ago

Also, I'd try to make the two examples at the top try to do the same thing, now they are a bit different I think

khsrali commented 1 month ago

That is because I started coding from inside out :smile: hahah Please check the update, now it's fine. Both examples do the same thing. add_link is done automatically in code, also can be given directly as depends_on to Task. But this is not the point/problem, it can be easily re-adjusted to the former behavior.

superstar54 commented 1 month ago

Hi @khsrali , thanks for sharing your idea here. I try to explain what WorkGraph implementations, and their use cases.

About using Class

WorkGraph uses Class, and it allows user to access the task or its outputs in three ways:

dictionary-like: wg.tasks["add1"].outputs["x"]
list-like: wg.tasks["add1"].outputs[0], wg.tasks[0].outputs[0]
attribute-like: wg.tasks.["add1"].outputs.x, or even wg.tasks.add1.outputs.x. Note: Tab-completable is not supported yet, but is planned.

Therefore, the code you propose to run should already work using WorkGraph.

wg = WorkGraph()
add_task = wg.add_task(add, name="add1", x=1, y=2)
wg.add_task(multiply, y=3, x=add_task.outputs.result)
wg.run()

In most cases, attribute-like and list-like can be very useful and straightforward. However, dictionary-like is the most robust and can be used in all cases. For example, user can program their work graph in an automatic fashion. When creating many tasks inside a for loop, the user can give a meaningful full name to the tasks, e.g., f"task_{i}", so that they can access it and create complex dependencies.

"Note: I don't want to access variables with wg.tasks[]!")

Users are not required to give a name for a task. However, they are still suggested to give the task a meaningful name so that later, when loading a finished WorkGraph, they can quickly access the tasks by their names.

About the creation and execution

In your proof of concept, you also introduce the way to execute the WorkGraph, like the future concept. I would like to say that the execution of the WorkGraph is rather complex because it requires asynchronous execution, checkpoints, data provenance of the workflow logic, etc.

In aiida-workgraph, there are separate classes: WorkGraph and WorkGraphEngine handle the creation and execution. You can check the WorkGraphEngine class if you are interested.

I am open to further discussion if you'd like to delve deeper into the execution engine.

aiidateam / aiida-workgraph