Open ehennestad opened 2 days ago
When I saw that there was a new field was_generated_by I initially thought that this was meant for storing information about which package was used for creating an nwb file, e.g pynwb, matnwb or NWB Guide, which I thought was great.
Only after reading the field description and this issue: #258, I realised that it is meant for storing information about software used to generate actual datasets / datatypes.
In my opinion, it would be great to have a field in the file dedicated to storing information about which software was used to generate the file (as I first interpreted it)
I think the current iteration of was_generated_by
is intended to be a catch-all for both types of information that you listed, software used to generate the NWBFile and software used to acquire/generate data (at least until we determine how we want to attach the latter to the actual data).
I think we could clarify the description in the schema and maybe add an example to make this clearer?
I also think it would make more sense to add information about which software generated a dataset to the actual datasets (similar to how you can add more detailed metadata to a device). Having a list on the file object itself is a slight improvement, but it requires some work for the user of the file to understand which software applies to which dataset/datatype which is not ideal
I agree adding the information about which software generated a particular dataset to the actual dataset is a better solution to help users understand which software was used to generate what data.
One potential approach is to add was_generated_by
as an optional dataset to the Container
data type in hdmf-common-schema
so that it is possible to add this optional dataset to all the NWB data types that inherit from Container
. Any thoughts on that?
This comment also has a more thorough summary of the provenance information and support we might want to add based on other discussions.
When I saw that there was a new field
was_generated_by
I initially thought that this was meant for storing information about which package was used for creating an nwb file, e.g pynwb, matnwb or NWB Guide, which I thought was great.Only after reading the field description and this issue: https://github.com/NeurodataWithoutBorders/nwb-schema/issues/258, I realised that it is meant for storing information about software used to generate actual datasets / datatypes.
In my opinion, it would be great to have a field in the file dedicated to storing information about which software was used to generate the file (as I first interpreted it).
I also think it would make more sense to add information about which software generated a dataset to the actual datasets (similar to how you can add more detailed metadata to a device). Having a list on the file object itself is a slight improvement, but it requires some work for the user of the file to understand which software applies to which dataset/datatype which is not ideal