Open sneakers-the-rat opened 1 year ago
hey @sneakers-the-rat
Glad to hear you trying to translate the nwb schema into LinkML. That is something that we are also interested in and trying to do in our free cycles:
Our work is still in adding the building blocks for array data to LinkML. This is not totally necessary, but would be nice. How are you approaching the translation?
In short, in the NWB schema namespace file is this line: https://github.com/NeurodataWithoutBorders/nwb-schema/blob/dev/core/nwb.namespace.yaml#L20 that basically says include all the types from hdmf-common. Data types in the NWB schema spec yamls like NWBData (in https://github.com/NeurodataWithoutBorders/nwb-schema/blob/dev/core/nwb.base.yaml) can then extend or reference data types defined in hdmf-common like Data. This same logic applies to extensions to NWB as well. They all include the NWB schema (named "core") and so extension data types can extend or reference NWB data types (which includes hdmf-common data types).
The resolution and merging logic is hidden in HDMF and kind of complicated. Here is an okay starting point for browsing the code: https://github.com/hdmf-dev/hdmf/blob/dev/src/hdmf/spec/namespace.py#L433 .
When the hdmf-common package is loaded, the hdmf-common namespace and spec yamls are parsed into a TypeMap in https://github.com/hdmf-dev/hdmf/blob/dev/src/hdmf/common/__init__.py
When the pynwb package is loaded in src/pynwb/init.py, first hdmf-common is imported which creates the hdmf-common TypeMap. Then the pynwb namespace and spec yamls are parsed into that TypeMap.
Hope that helps? Feel free to ask follow-up questions!
oh great!!!!!! if you are already working on it then I will follow whatever you've been doing.
That's sort of how I thought it worked, just wanted to check in case I was missing something. I think linkml (and eventually building out integration with semweb/LD generally) will make that resolution a lot easier, since you can directly reference other vocabularies with URIs.
My approach so far is trying to move some of the logic from the schema language to the schema itself - so eg. declaring arrays as class rather than part of the schema language. So not a 1:1 translation, but trying to make a functional translation that can generate valid NWB. It seems like most concepts map to LinkML and other schema languages, even if they require a bit of reorganization to make fit (eg. the inheritance and include system).
First goal is that I want to get to a point where you can just use pydantic models to make and interact with NWB files, next goal is I want to abstract the format so HDF5 is just one possible backend.
For context I am using NWB as a test case in making a p2p linked data protocol - it's a complex format with a mix of raw data and metadata, extensions, etc. so good test for making translation and codec layers to bridge other formats (that, and I am ostensibly a neuroscientist and would like to help out with Neuro tooling ;). Hoping I can be useful as an outside eye.
I think this is related enough to be part of this issue - apparently hdmf-common
is written in a distinct schema language? https://hdmf-schema-language.readthedocs.io/en/latest/
Making this comment primarily to link to where it was discussed previously since it's related: https://github.com/NeurodataWithoutBorders/nwb-schema/issues/383#issuecomment-682345606
but also generally curious - is this one of those accidents of history that would require too much labor to unify, or is this desired behavior?
the only difference I can see between the two json schemas is the naming of neurodata_type_def
vs data_type_def
and inc
, so it seems like they are equivalent just renamed, but I also see some information leakage -> special casing back to hdmf here:
which makes me think there might be some intentional semantic difference between them?
Again mostly a curiosity question
the only difference I can see between the two json schemas is the naming of
neurodata_type_def
vsdata_type_def
andinc
Yes, that is the only difference. The neurodata_type_*
terminology stems from NWB and the desire from users for a more intuitive term. HDMF was extracted later from NWB as a more general library and the neurodata_type
term is just not appropriate for other domains so it was changed in HDMF to just data_type
.
hey what up i am back to be obnoxious again.
I am trying to translate the nwb schema to linkML, making a prototype p2p system and want to make nwb data interoperable with other schemas. I can figure out most of the mappings, but there is a reference in the core namespace to
hdmf-common
and I can't figure out how that link resolution works.I see the submodule and found https://github.com/hdmf-dev/hdmf-common-schema/blob/80efce315fcd6c198c512ba526e763f81b535d36/common/namespace.yaml#L3
The docs say
but it doesn't describe what it means to be a "known namespace" or how the resolution is handled during the build.
I see that
hdmf-common
makes its way into the test file generated here: https://github.com/NeurodataWithoutBorders/pynwb/blob/e97dec4cd90eec4cfb785dc3d8d9c85bc3ae3250/src/pynwb/testing/make_test_files.py#L201and that seems to happen from some hidden logic in pynwb here: https://github.com/NeurodataWithoutBorders/pynwb/blob/e97dec4cd90eec4cfb785dc3d8d9c85bc3ae3250/src/pynwb/__init__.py#L52
So I'm wondering if that is correct. If so, I can special case it, but just wanted to make sure