creare-com / podpac

Pipeline for Observational Data Processing Analysis and Collaboration
https://podpac.org
Apache License 2.0
45 stars 6 forks source link

Fix JSON data format #494

Closed mpu-creare closed 1 year ago

mpu-creare commented 2 years ago

Description The PODPAC JSON format for pipelines is NOT (in fact) JSON. PODPAC requires ORDERING of the key-value pairs -- this worked through Python using the OrderedDict trait. While working in JavaScript we couldn't consistently serialize ordered pairs, which led to actually reading the JSON spec. Whoops.

Expected Behavior The PODPAC JSON pipeline format should NOT rely on ordering of keys.

mpu-creare commented 2 years ago

Original JSON (relies on the output Node Arithmetic to be the LAST key defined):

{
    "SinCoords": {
        "node": "core.algorithm.utility.SinCoords"
    },
    "Arange": {
        "node": "core.algorithm.utility.Arange"
    },
    "Arithmetic": {
        "node": "core.algorithm.generic.Arithmetic",
        "attrs": {
            "eqn": "a+b",
            "params": {}
        },
        "inputs": {
            "a": "SinCoords",
            "b": "Arange"
        }
    }
}

Possible fixes

Use a list

{   "nodes": [
        {
            "node_name": "SinCoords",
            "node": "core.algorithm.utility.SinCoords"
        },
        {
            "node_name": "Arange",
            "node": "core.algorithm.utility.Arange"
        },
        {
            "node_name": "Arithmetic",
            "node": "core.algorithm.generic.Arithmetic",
            "attrs": {
                "eqn": "a+b",
                "params": {}
            },
            "inputs": {
                "a": "SinCoords",
                "b": "Arange"
            }
        }
    ]
}

Add an ordered list entry in JSON

{
    "Arithmetic": {
        "node": "core.algorithm.generic.Arithmetic",
        "attrs": {
            "eqn": "a+b",
            "params": {}
        },
        "inputs": {
            "a": "SinCoords",
            "b": "Arange"
        }
    },
    "SinCoords": {
        "node": "core.algorithm.utility.SinCoords"
    },
    "Arange": {
        "node": "core.algorithm.utility.Arange"
    },
    "podpac_order": ["SinCoords", "Arange", "Arithmetic"]
}
jmilloy commented 2 years ago

I'm not sure that either one is more or less backwards compatible.

You'll update the podpac Node.from_definition loop for name, d in definiton.items() with either something like:

for name in definition.get('podpac_order`, definition.keys()):
    d = definition[name]
    ....

or

if 'nodes' in definition:
    node_definitions = [d['node_name'], d) for d in definition['nodes']]
else:
    node_definitions = definition.items()
for name, d in node_definitions:
    ....
CFoye-Creare commented 1 year ago

Instead of editing how pipelines are saved, maybe we should change how they are loaded. When loading from JSON, we should write logic such that if there is a load/dependency error where a required node has yet to be loaded, recursively call from_json on the rest of the OrderedDict, returning once all dependencies have been satisfied. If not, continue to recursively call. I'm pretty sure this can be done via dynamic programming to avoid the overhead/weight of recursive calls.

CFoye-Creare commented 1 year ago

Sample Pipeline

Using this sample code from Matt to create a pipeline:

import podpac
spec = podpac.core.utils.get_ui_node_spec()
a = podpac.algorithm.SinCoords()
b = podpac.algorithm.Arange()
c = podpac.algorithm.Arithmetic(a=a, b=b, eqn='a+b')

Errors Arising:

When changing the order of the JSON file such that Arithmetic is before its input nodes, we get the following error output:

In [4]: c_test = podpac.core.node.Node.load("/home/cfoye/Projects/SoilMap/podpac/ex_json.JSON")
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[4], line 1
----> 1 c_test = podpac.core.node.Node.load("/home/cfoye/Projects/SoilMap/podpac/ex_json.JSON")

File ~/Projects/SoilMap/podpac/podpac/core/node.py:961, in Node.load(cls, path)
    959 with open(path) as f:
    960     d = json.load(f, object_pairs_hook=OrderedDict)
--> 961 return cls.from_definition(d)

File ~/Projects/SoilMap/podpac/podpac/core/node.py:878, in Node.from_definition(cls, definition)
    875     kwargs[k] = v
    877 for k, v in d.get("inputs", {}).items():
--> 878     kwargs[k] = _lookup_input(nodes, name, v)
    880 for k, v in d.get("lookup_attrs", {}).items():
    881     kwargs[k] = _lookup_attr(nodes, name, v)

File ~/Projects/SoilMap/podpac/podpac/core/node.py:1227, in _lookup_input(nodes, name, value)
   1221     raise ValueError(
   1222         "Invalid definition for node '%s': invalid reference '%s' of type '%s' in inputs"
   1223         % (name, value, type(value))
   1224     )
   1226 if not value in nodes:
-> 1227     raise ValueError(
   1228         "Invalid definition for node '%s': reference to nonexistent node '%s' in inputs" % (name, value)
   1229     )
   1231 node = nodes[value]
   1233 # copy in debug mode

ValueError: Invalid definition for node 'Arithmetic': reference to nonexistent node 'SinCoords' in inputs
mpu-creare commented 1 year ago

closed by #501