NeurodataWithoutBorders / nwb-schema

Data format specification schema for the NWB neurophysiology data format
http://nwb-schema.readthedocs.io
Other
52 stars 16 forks source link

Specify format for experimenter name #528

Open rly opened 1 year ago

rly commented 1 year ago

The current schema for experimenter name: https://github.com/NeurodataWithoutBorders/nwb-schema/blob/b22fdb7e3f15eed2acce1b33b77dfbe7e31f86e6/core/nwb.file.yaml#L171-L179

1) does not specify a format for the name(s) 2) says that the role(s) can be specified, but does not specify how

This makes the experimenter name impossible to parse by machines.

DANDI requires that the experimenter name match the regex pattern: https://github.com/dandi/dandi-schema/blob/master/dandischema/models.py#L60 i.e., it has to be of form LastName, FirstName. Or more specifically, [characters from set A], [characters from set A], where A = {a, b, ..., z, A, B, ... Z, 0, 1, ..., 9, _, -, ., space}

The NWB Inspector performs the same check and documents this as a best practice: https://github.com/NeurodataWithoutBorders/nwbinspector/pull/227/ https://github.com/NeurodataWithoutBorders/nwbinspector/issues/33 see also https://github.com/NeurodataWithoutBorders/nwbinspector/issues/253

I think we should 1) document and recommend this best practice in the schema docs 2) decide on how to specify roles for the experimenters

Some options for 2: a) change experimenter from a dataset of shape (N, ) to shape (N, 2) and make the first column be the experimenter name and the second column be the role (a breaking change) b) add an optional separate dataset for experimenter_role that is aligned with experimenter c) add an optional attribute on the experimenter dataset called experimenter_role that is aligned with experimenter d) remove the suggestion that the role can be specified

This comes up as https://github.com/LorenFrankLab/spyglass seeks to store names and import experimenter names from NWB files into its database.

@bendichter @CodyCBakerPhD @oruebel what do you think?

rly commented 1 year ago

An experimenter could have multiple roles in a project as well. See https://www.nature.com/nature-index/news-blog/researchers-embracing-visual-tools-contribution-matrix-give-fair-credit-authors-scientific-papers Do we want to support that as well, and how?

rly commented 1 year ago

This raises a separate issue of how much we want to include best practice suggestions in the NWB schema docs itself. I think once a best practice suggestion is relatively stable, it should be mentioned in the schema docs and the API docs. That may be tedious though.

CodyCBakerPhD commented 1 year ago

says that the role(s) can be specified, but does not specify how

I noticed this part as well - it came up back in the Best Practice discussion here but nothing was resolved with it, mainly because the DANDI metadata has a much richer schema for specifying all experimenter-related information, for example a contributor (which associated a role as an attribute), and a person (that includes associating people with institution).

Given everything else you point out about how strict the regex is for the name, I do think we should at the very least go with option (d) and remove the bit of text 'Can also specify roles of different people involved.' from the NWB schema instead of contrive an additional way of including that in the free-text string.

Aside from that, I think the best way all-around would be similar to (c). (a) and (b) both feel like band-aids that wouldn't generalize to additional fields; best solution would be to define a separate schema type for an actual Experimenter and have it mimic the DANDI fields as attributes as closely as possible, then have the nwbfile.experimenters field just be a list containing instances of those types. If desired, you can avoid breaking back-compatibility by simply relaxing that and accepting a list of strings and the additional Experimenter object type.

Now, I know how to type something like that up in JSON but not so much in the NWB schema language...

An experimenter could have multiple roles in a project as well.

To which I'd again mimic the DANDI structure (Optional[List[RoleType]]) if that's what we're going for. See their RoleTypeDict for all the different possibilities.

This raises a separate issue of how much we want to include best practice suggestions in the NWB schema docs itself. I think once a best practice suggestion is relatively stable, it should be mentioned in the schema docs and the API docs. That may be tedious though.

This is really the big question at the heart of this. We can patch the Experimenter object, or minimal adjust the schema descriptions, but this begins precedent for adapting all of the NWB metadata schema around the DANDI schema.

Currently, I'd say the DANDI schema is much more strict both in it's structure and the impose form on content - they have regex's for just about everything. Whereas it's my impression that the goal of the NWB schema was always to not be 'too strict' in order to let people use it in whatever way(s) they wish rather than for the explicit purpose of one day ending up on the archive.

I know @bendichter biggest concern about these things, and the motivation to make the NWB Inspector a separate tool from the core NWB stuff, is we don't want to build too high of a 'wall'/'barrier' for people just trying to make a minimally working NWB file. Not every NWB file is created with the explicit purpose of passing DANDI validation (though any automatically generated via NeuroConv should be, in principle). If we want to change that philosophy, people are going to have to start doing a lot more work to insert and conform metadata to tighter standards just to create their first/initial NWB file.