Closed dboas closed 3 years ago
Alternatively: /nirs(i)/stim(j)/metadata/data /nirs(i)/stim(j)/metadata/names
hi @dboas, I agree it could be useful to add metadata to stim. what type of metadata do we expect? specifically, do you think this metadata field should be used to describe each stim channel? or just general channel-independent metadata?
basically, channel independent metadata can be handled similarly to /nirs/metaDataTags
; channel-dependent metadata can be handled similarly to /nirs/data/measurementList
, as long as these metadata can be categorized into a set of standardized fields. We should probably ask for some examples to assess which path to go.
@fangq sorry for not being clear. We presently have /nirs(i)/stim(j)/data From the spec this is described as: This is a three-column array specifying the stimulus time course for the jth condition. Each row corresponds with a specific stimulus trial. The three columns indicate [starttime duration value]. The starttime, in seconds, is the time relative to the time origin when the stimulus takes on a value; the duration is the time in seconds that the stimulus value continues, and value is the stimulus amplitude. The number of rows is not constrained. (see examples in the appendix).
Basically, there are use cases where users want additional columns to provide metadata values for each stimulus trial (i.e. each row of /nirs(i)/stim(j)/data). The most likely use cases provide behavioral data associated with each stimulus trial as values in these entries. We don't want to expand the number of columns for /nirs(i)/stim(j)/data, so we thought it better to add an optional matrix of values /nirs(i)/stim(j)/metadata which would have the same number of rows as /nirs(i)/stim(j)/data and have an arbitrary number of columns. In addition, we can have an optional /nirs(i)/stim(j)/metadataNames that would provide a useful descriptor name for each column of /nirs(i)/stim(j)/metadata
Is this more clear?
It would be nice to have @huppertt thoughts as well.
@dboas, what was the concern to extend the columns of /stim/data? in my opinion, changing the language to accept more than 3 column in /stim/data is the easiest way to accommodate such data, it is also the most efficient way to read and write.
if one also intends to annotate these additional data columns, such as defining the meaning of each extra column, we can add such lightweight metadata in the form of dataset attribute
, supported in hdf5. My easyh5 parser already supports reading attribute data if exist in the file.
https://github.com/fangq/easyh5/blob/master/loadh5.m#L287-L293
My initial reason for separating it into another field is because our /stim/data format currently exactly matches what is ubiquitous in the fMRI community. I don’t really want to change that.
I do hope that Ted can provide some guidance on this.
From: Qianqian Fang notifications@github.com Reply-To: fNIRS/snirf reply@reply.github.com Date: Wednesday, August 12, 2020 at 2:17 PM To: fNIRS/snirf snirf@noreply.github.com Cc: "Boas, David" dboas@bu.edu, Mention mention@noreply.github.com Subject: Re: [fNIRS/snirf] proposal to add metadata to parallel /nirs(i)/stim(j)/data (#47)
@dboashttps://github.com/dboas, what was the concern to extend the columns of /stim/data? in my opinion, changing the language to accept more than 3 column in /stim/data is the easiest way to accommodate such data, it is also the most efficient way to read and write.
if one also intends to annotate these additional data columns, such as defining the meaning of each extra column, we can add such lightweight metadata in the form of dataset attribute, supported in hdf5. My easyh5 parser already supports reading attribute data if exist in the file.
https://github.com/fangq/easyh5/blob/master/loadh5.m#L287-L293
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/fNIRS/snirf/issues/47#issuecomment-673032969, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AHFCP5BC27T6HFFQOJJI34LSALMDRANCNFSM4P3QWKDQ.
A Homer3 user brought up a case in which filtering or otherwise organizing stims based on some scalar metadata value they each have would be useful.
@jayd1860 I'm also realizing that it would be great to move the status of the stims (enabled/disabled) in Homer to one of these metadata columns. How do you feel about Qianqian's proposal (below) we make stim/data an indefinite number of columns as opposed to a new stim/metadata array?
@dboas, what was the concern to extend the columns of /stim/data? in my opinion, changing the language to accept more than 3 column in /stim/data is the easiest way to accommodate such data, it is also the most efficient way to read and write.
if one also intends to annotate these additional data columns, such as defining the meaning of each extra column, we can add such lightweight metadata in the form of dataset
attribute
, supported in hdf5. My easyh5 parser already supports reading attribute data if exist in the file.https://github.com/fangq/easyh5/blob/master/loadh5.m#L287-L293
I am fine with extending /stim/data to have more than 3 columns as long as the first 3 columns maintain their same format that matches what is ubiquitously used in fMRI as @fangq suggested above.
We also need to annotate these extra columns, i.e. give them descriptions. @fangq suggested the follow. How would this work in practice?
if one also intends to annotate these additional data columns, such as defining the meaning of each extra column, we can add such lightweight metadata in the form of dataset attribute, supported in hdf5. My easyh5 parser already supports reading attribute data if exist in the file.
Would extending the number of columns of stim/data break existing implementations?
I have a working implementation of the stim metadata which expands the stim data array to have more columns. This implementation isn't incompatible with anything with Homer3 so far.
This approach allows a list of metadata IDs to be specified with additional strings, i.e. for an Nx4 data array, the metadata ID list would have one entry (such as 'col_str_1') and for an Nx5 data array the metadata ID list would have two entries... the columns of the data array from L to R being for [onset, duration, amplitude, col_str_1, col_str_2]
I have stored the metadata column id as an attribute as @fangq suggested:
Unfortunately MATLAB does not support the use of string arrays for HDF5 attributes, so saving a true C style array of strings would be difficult. I've formatted it as csv for now.
I guess I prefer storing the metadata column IDs as an array of strings next to data, anyway (/nirs(i)/stim(j)/metadataIds or something) as it seems more consistent with the rest of the spec.
@sstucker and @dboas, sorry for late reply, was working on a proposal deadline before Thanksgiving.
Would extending the number of columns of stim/data break existing implementations?
it should work out of box for Easyh5 because it does not particularly validate data against SNIRF (but reading/writing them as generic data records). I do expect that the validation script @huppertt wrote need to be updated to allow this.
Unfortunately MATLAB does not support the use of string arrays for HDF5 attributes, so saving a true C style array of strings would be difficult. I've formatted it as csv for now.
can you try the low-level API (H5A.*
) to see if you can save such data? see example code here
https://github.com/fangq/easyh5/blob/master/saveh5.m#L342-L345
In my experience the MATLAB H5A API allows you to save a cell array of strings as data but not as an attribute. This was confirmed here. I am sure it could be hacked but it is perhaps not worth it when we can just add it as a new field rather than an attribute.
@fangq can we just add it as a new field?
Please see my above commit
https://github.com/fNIRS/snirf/commit/2ab9e99d2876d0dcaca353a5ea842957f659c99a
because the attribute is directly associated with the data, so I don't see the needs to use long attribute names, so I picked names
for simplicity. Let me know if this makes sense.
@fangq , this is the only use of attributes in the SNIRF spec. Seems inconsistent to use it here when we never used it before.We could have used it for stim.name. We could use it for probe.sourceLabels, provbe.detectorLabels, and aux.name.
Further, in HDF5, as far as we understand, an attribute is not intended as data for the user. But in this case, stim/data/name is defined by the user and thus very much intended for the users benefit. Another way of saying this is that the value of 'name' in no way impacts how we load the data.
We thus argue that we NOT use attribute for stim/data/name but instead make it a separate field very much like stim/name We suggest calling it stim/dataLabels since it is a string array and this makes it consistent with probe.sourceLabels and probe.detectorLabels
See the proposed implementation in the BUNPC fork of this repository
@dboas, I agree there is a bit inconsistency - we could have used attributes for those fields in the first place. To make things simpler, I update the language and added stim.dataLabels
, similar to probe.sourceLabels
. Let me know if this looks better.
@sstucker, I just noticed, after committing my edits, that you have already committed something similar in BUNPC fork - although in the commit cited above, I saw you had only the sync of my previous commit.
I see our changes are similar, and your dataLabels are labels for all columns after 3. @dboas, let me know which version you prefer. In either case, I would like you to create a pull request so we can manually merge.
Also, @sstucker, please use pull-request feature to sync your version with upstream (fNIRS/snirf). Do not manually create identical commits in your fork, it can make merge difficult.
@dboas and I agree that dataLabels
should have the same number of elements as data
has columns so that indexing remains the same. The first three elements of dataLabels
should really be constrained to some version of Onset, Duration, Amplitude but it doesn't really matter as it is just for bookkeeping purposes.
This looks good!
@sstucker, my bad, did not see your PR #49. I patched the spec with the minor edits you have. if you see anything else still missing, feel free to create a new PR
Some users would like to have metadata attached to each stimulus trial that they can then utilize to include or exclude trials from averaging. This is quite useful, for instance, with infant looking time during each trial.
To permit this, we would need to add an optional matrix of values something like /nirs(i)/stim(j)/metadata This would have to have the same number of rows as /nirs(i)/stim(j)/data But the number of columns is unconstrained
/nirs(i)/stim(j)/metadataNames This would be optional and provide a useful descriptor name for each column of the metadata.
@fangq @sstucker @jayd1860 and others Please comment on any issues with adding these optional fields to SNIRF.