Open rly opened 2 weeks ago
@rly Thanks for the suggestion, we have also received similar requests a few times and we also are not quite sure where to start! I am outlining our questions / concerns below, happy to have a call to discuss.
1) All writers in Bonsai are optimized for streaming. This is crucial for us to enable the reactive plug-and-play nature of logging in Bonsai, where you may want to record anything at anytime, aligned on anything. So we tend to prefer writers which are general and lightweight, for example:
Generic writers are informative in Bonsai because the type system can be leveraged to do compile-time code generation as in the case of CSV so that the type metadata can inform the structure of the output file before streaming starts! For example, in the case of NWB we could easily infer the attributes of the table and use that to initialize required metadata.
2) Despite the above, we never really understood HDF5 and NWB to be a format amenable to streaming. We hesitate to cache or queue a conversion in writers after the fact since we use Bonsai for both very long and very short experiments, and for both high-throughput and low-throughput data, so there is little way to know beforehand how many records we will produce, or when we need to start and stop the recording. Any hints on whether this is possible (or could be made possible) would be really helpful.
3) In terms of interfaces to NWB libraries directly, I can pick up from the suggestions in the original issue:
1) use Python.NET to call functions in PyNWB
While I can see the convenience of this solution, I think this would be very unappealing to the general Bonsai community. While the Python scripting package is a very powerful tool to unblock certain advanced applications, it is definitely not a dependency that regular users expect to have and would greatly complicate the deployment process. Furthermore, because the Python package still depends heavily on the GIL (even with 3.13 the transition will be slow), this would kill the implicit parallelism which is one of the core performance features of Bonsai.
2) use SWIG to wrap the AqNWB C++ API for C#, or
I am not sure what AqNWB is, but as long as it allows for threaded parallelism, this sounds reasonable. I probably wouldn't use SWIG but we have other solutions we could recommend to make this possible.
3) use HDF5 directly.
This seems the most pragmatic approach, since you immediately gain the C# wrapper (there are a number of them on nuget.org) and it would give us a chance to build up the NWB standard in a targeted way with optimizations specific to Bonsai.
4) Alternately we can start by exploring ways to stream NWB files into Bonsai since there seems to be a streaming IO interface for reading in NWB already? Maybe we could use that to understand better the options while we analyze the case for writing.
5) One final note: from reading through different applications for NWB it seems like it might be used primarily for processed data, rather than raw data, i.e. I guess you probably wouldn't store raw video or ephys data directly in NWB format? Would it be useful to prioritize applications for exporting experiment metadata into NWB?
An NWB user asked for NWB export in Bonsai: https://github.com/hdmf-dev/hdmf/issues/1196
The NWB team is willing to help with such a feature, but we do not know where to start. Could you please provide guidance and support? Could we meet with you over a zoom call? cc @justidy1