aiidateam / aiida-workgraph

Efficiently design and manage flexible workflows with AiiDA, featuring an interactive GUI, checkpoints, provenance tracking, and remote execution capabilities.
https://aiida-workgraph.readthedocs.io/en/latest/
MIT License
9 stars 5 forks source link

Failed to de-serialize the local function #261

Closed superstar54 closed 2 weeks ago

superstar54 commented 2 weeks ago
from aiida_workgraph import WorkGraph, task
import pathlib
from aiida import orm, load_profile

import pathlib

load_profile()

@task.graph_builder(outputs = [{"name": "eigvals", "from": "cat_task.eigvals"}])
def query_and_diag(matrix_pk: orm.Int):
    from aiida_shell.data import PickledData
    wg = WorkGraph()

    # In an .ipynb at least defining it here works 
    def myparser(dirpath: pathlib.Path) -> dict[str, orm.Data]:
        x = 1
        return {'eigvals': orm.Int(x)}

    wg.add_task("ShellJob", name="cat_task",
                        command="cat",
                        arguments=[],
                        parser=myparser,
                        outputs=[],
                        parser_outputs=[{"name": "eigvals"}],
                    )
    return wg
i = 5
wg = WorkGraph("processing_data")
query_and_diag_task = wg.add_task(query_and_diag, name=f"query_and_diag_pk{i}", matrix_pk=orm.Int(i))
wg.run()

It gave this error about "!!python/name:__main__.myparser ''\n"

In workgraph, we use aiida.orm.utils.serialize.serialize (yaml serialize) to serialize the input data of a workgraph and save it to the node.base.extras. However, the yaml seralizer can not handle the local function properly. If the function is defined locally, it typically has a name like <function __main__.outer_function.<locals>.myparser>. However, the YAML serialization system is not designed to handle local functions correctly, as shown below:

In [20]: wg.tasks['cat_task'].inputs['parser'].value
Out[20]: <function __main__.query_and_diag.<locals>.myparser(dirpath: pathlib.Path) -> dict[str, aiida.orm.nodes.data.data.Data]>

In [21]: serialize(wg.tasks['cat_task'].inputs['parser'].value)
Out[21]: "!!python/name:__main__.myparser ''\n"

After serialization, it change it to __main__

Solutions

use pickle to serialize it and then use yaml, we need to use yaml because we want to save it into node.base.extrase using the aiida serialize