NeurodataWithoutBorders / pynwb

A Python API for working with Neurodata stored in the NWB Format
https://pynwb.readthedocs.io
Other
176 stars 85 forks source link

MultiContainerInterface extensions with multiple sub-containers fails to round-trip #840

Closed angevineMiller closed 5 years ago

angevineMiller commented 5 years ago

@tjd2002

For representing our behavioral data, we would like to create MultiContainerInterface extensions with multiple sub-containers (i.e. a Graph that contains many nodes and edges). In the example below, only one of the sub-containers is round-tripping its data contents. In particular, only the container which is listed first in the __clsconf__ attribute of the MultiContainerInterface class definition.

import numpy as np
from datetime import datetime
import pynwb
from pynwb import register_class, load_namespaces
from pynwb.spec import NWBNamespaceBuilder, NWBGroupSpec, NWBDatasetSpec
from pynwb.file import MultiContainerInterface, NWBContainer, NWBDataInterface
from pynwb.form.utils import docval

# Create the spec
# ---------------------------------------
name = 'multicontainerinterface'
ns_path = name + ".namespace.yaml"
ext_source = name + ".extensions.yaml"

nodes = NWBGroupSpec(neurodata_type_def='Node',
                    neurodata_type_inc='NWBDataInterface',
                    doc='nodes in the graph',
                    quantity='*'
                    )
edges = NWBGroupSpec(neurodata_type_def='Edge',
                    neurodata_type_inc='NWBDataInterface',
                    doc='edges in the graph',
                    quantity='*')
graph = NWBGroupSpec(neurodata_type_def='Graph',
                     neurodata_type_inc='NWBDataInterface',
                     name='graph',
                     doc='a graph of nodes and edges', 
                     groups=[nodes, edges],
                    )

ns_builder = NWBNamespaceBuilder(name + ' extensions', name)
ns_builder.add_spec(ext_source, graph)
ns_builder.export(ns_path)

# Register PyNWB classes
# ---------------------------------------
load_namespaces(ns_path)

@register_class('Node', 'multicontainerinterface')
class Node(NWBDataInterface):

    __nwbfields__ = ('name',)

    @docval({'name': 'name', 'type': str, 'doc': 'the name of this node'})
    def __init__(self, **kwargs):
        super(Node, self).__init__(name=kwargs['name'])

@register_class('Edge', 'multicontainerinterface')
class Edge(NWBDataInterface):

    __nwbfields__ = ('name',)

    @docval({'name': 'name', 'type': str, 'doc': 'the name of this edge'})
    def __init__(self, **kwargs):
        super(Edge, self).__init__(name=kwargs['name'])

@register_class('Graph', 'multicontainerinterface')
class Graph(MultiContainerInterface):
    """A multicontainer of nodes and undirected edges."""

    __nwbfields__ = ('name', 'edges', 'nodes')

    __clsconf__ = [
        {
        'attr': 'nodes',
        'type': Node,
        'add': 'add_node',
        'get': 'get_node'
        },
        {
        'attr': 'edges',
        'type': Edge,
        'add': 'add_edge',
        'get': 'get_edge'
        }
    ]

# Make a graph with nodes and an edge
# ---------------------------------------
g = Graph(name='test')
n1 = Node(name='node1')
n2 = Node(name='node2')
e1 = Edge(name='edge1')
g.add_node(n1)
g.add_node(n2)
g.add_edge(e1)

# Save the NWBFile
# ---------------------------------------
nwbfile = pynwb.NWBFile("An NWB file", "graph_test", datetime(2018, 6, 1))
pmod = nwbfile.create_processing_module('behavior', 'behavioral graph')
pmod.add_container(g)
with pynwb.NWBHDF5IO('graph_test.nwb', 'w') as io:
    io.write(nwbfile)

# Read the NWBFile
# ---------------------------------------
io = pynwb.NWBHDF5IO('graph_test.nwb', 'r')
nwb = io.read()
print(nwb.get_processing_module()['behavior'])
io.close()

We only get the data for the nodes:

graph <class '__main__.Graph'>
Fields:
  edges: { }
  nodes: { node1 <class '__main__.Node'>,  node2 <class '__main__.Node'> }

If, instead, I swap the order of the __clsconf__ attribute to:

__clsconf__ = [
        {
        'attr': 'edges',
        'type': Edge,
        'add': 'add_edge',
        'get': 'get_edge'
        },
        {
        'attr': 'nodes',
        'type': Node,
        'add': 'add_node',
        'get': 'get_node'
        }
    ]

Then, after round-tripping the data, we get the edges but not the nodes:

graph <class '__main__.Graph'>
Fields:
  edges: { edge1 <class '__main__.Edge'> }
  nodes: { }

If, instead, I try to instantiate a Graph with nodes and edges directly, without using the adder methods,

n1 = Node(name='node1')
n2 = Node(name='node2')
e1 = Edge(name='edge1')
g = Graph(name='test', edges=[e1], nodes=[n1, n2])

It fails with:

TypeError: unrecognized argument: 'nodes'

Are we missing something important to be able to correctly define MultiContainerInterfaces with more than one sub-container?

Checklist

bendichter commented 5 years ago

@angevineMiller Thanks for report. I think this is the same underlying issue as #787. The fact that __clsconf__ is a list would imply that MultiContainerInterfaces can support containing more than one type, and I agree that it would be useful to do so. Also, it is problematic that this looks like it works, only to fail after round trip. We'll look into this. @ajtritt, do you have any ideas? Would this fall under HDMF or pynwb?

ajtritt commented 5 years ago

This code, which is responsible for making the constructor, sits inside a loop over __clsconf__: https://github.com/NeurodataWithoutBorders/pynwb/blob/6bb2a01d9589d35ea5caf4b0620f862ca3e6162b/src/pynwb/core.py#L864-L868 It just needs to be moved out.

This hasn't been exposed yet since all the MultiContainerInterface classes in PyNWB define their own constructors.

I suspect if you look directly at the HDF5 file, then both Nodes and Edges exist. Is that not the case?