How does the `hdmf-common` namespace work?

sneakers-the-rat commented 1 year ago

hey what up i am back to be obnoxious again.

I am trying to translate the nwb schema to linkML, making a prototype p2p system and want to make nwb data interoperable with other schemas. I can figure out most of the mappings, but there is a reference in the core namespace to hdmf-common and I can't figure out how that link resolution works.

I see the submodule and found https://github.com/hdmf-dev/hdmf-common-schema/blob/80efce315fcd6c198c512ba526e763f81b535d36/common/namespace.yaml#L3

The docs say

namespace describes a named reference to another namespace. In contrast to source, this is a reference by name to a known namespace (i.e., the namespace is resolved during the build and must point to an already existing namespace). This mechanism is used to allow, e.g., extension of a core namespace (here the NWB core namespace) without requiring hard paths to the files describing the core namespace. Either source or namespace must be specified, but not both.

but it doesn't describe what it means to be a "known namespace" or how the resolution is handled during the build.

I see that hdmf-common makes its way into the test file generated here: https://github.com/NeurodataWithoutBorders/pynwb/blob/e97dec4cd90eec4cfb785dc3d8d9c85bc3ae3250/src/pynwb/testing/make_test_files.py#L201

and that seems to happen from some hidden logic in pynwb here: https://github.com/NeurodataWithoutBorders/pynwb/blob/e97dec4cd90eec4cfb785dc3d8d9c85bc3ae3250/src/pynwb/__init__.py#L52

So I'm wondering if that is correct. If so, I can special case it, but just wanted to make sure

rly commented 1 year ago

hey @sneakers-the-rat

Glad to hear you trying to translate the nwb schema into LinkML. That is something that we are also interested in and trying to do in our free cycles:

Our work is still in adding the building blocks for array data to LinkML. This is not totally necessary, but would be nice. How are you approaching the translation?

In short, in the NWB schema namespace file is this line: https://github.com/NeurodataWithoutBorders/nwb-schema/blob/dev/core/nwb.namespace.yaml#L20 that basically says include all the types from hdmf-common. Data types in the NWB schema spec yamls like NWBData (in https://github.com/NeurodataWithoutBorders/nwb-schema/blob/dev/core/nwb.base.yaml) can then extend or reference data types defined in hdmf-common like Data. This same logic applies to extensions to NWB as well. They all include the NWB schema (named "core") and so extension data types can extend or reference NWB data types (which includes hdmf-common data types).

The resolution and merging logic is hidden in HDMF and kind of complicated. Here is an okay starting point for browsing the code: https://github.com/hdmf-dev/hdmf/blob/dev/src/hdmf/spec/namespace.py#L433 .

When the hdmf-common package is loaded, the hdmf-common namespace and spec yamls are parsed into a TypeMap in https://github.com/hdmf-dev/hdmf/blob/dev/src/hdmf/common/__init__.py

When the pynwb package is loaded in src/pynwb/init.py, first hdmf-common is imported which creates the hdmf-common TypeMap. Then the pynwb namespace and spec yamls are parsed into that TypeMap.

Hope that helps? Feel free to ask follow-up questions!

sneakers-the-rat commented 1 year ago

oh great!!!!!! if you are already working on it then I will follow whatever you've been doing.

That's sort of how I thought it worked, just wanted to check in case I was missing something. I think linkml (and eventually building out integration with semweb/LD generally) will make that resolution a lot easier, since you can directly reference other vocabularies with URIs.

My approach so far is trying to move some of the logic from the schema language to the schema itself - so eg. declaring arrays as class rather than part of the schema language. So not a 1:1 translation, but trying to make a functional translation that can generate valid NWB. It seems like most concepts map to LinkML and other schema languages, even if they require a bit of reorganization to make fit (eg. the inheritance and include system).

First goal is that I want to get to a point where you can just use pydantic models to make and interact with NWB files, next goal is I want to abstract the format so HDF5 is just one possible backend.

For context I am using NWB as a test case in making a p2p linked data protocol - it's a complex format with a mix of raw data and metadata, extensions, etc. so good test for making translation and codec layers to bridge other formats (that, and I am ostensibly a neuroscientist and would like to help out with Neuro tooling ;). Hoping I can be useful as an outside eye.

sneakers-the-rat commented 1 year ago

I think this is related enough to be part of this issue - apparently hdmf-common is written in a distinct schema language? https://hdmf-schema-language.readthedocs.io/en/latest/

Making this comment primarily to link to where it was discussed previously since it's related: https://github.com/NeurodataWithoutBorders/nwb-schema/issues/383#issuecomment-682345606

but also generally curious - is this one of those accidents of history that would require too much labor to unify, or is this desired behavior?

the only difference I can see between the two json schemas is the naming of neurodata_type_def vs data_type_def and inc, so it seems like they are equivalent just renamed, but I also see some information leakage -> special casing back to hdmf here:

which makes me think there might be some intentional semantic difference between them?

Again mostly a curiosity question

oruebel commented 1 year ago

the only difference I can see between the two json schemas is the naming of neurodata_type_def vs data_type_def and inc

Yes, that is the only difference. The neurodata_type_* terminology stems from NWB and the desire from users for a more intuitive term. HDMF was extracted later from NWB as a more general library and the neurodata_type term is just not appropriate for other domains so it was changed in HDMF to just data_type.

NeurodataWithoutBorders / nwb-schema

How does the `hdmf-common` namespace work? #540