NeurodataWithoutBorders / nwb-schema

Data format specification schema for the NWB neurophysiology data format
http://nwb-schema.readthedocs.io
Other
52 stars 16 forks source link

Remove 'help' attribute broadly #270

Closed rly closed 5 years ago

rly commented 5 years ago

What is the difference between the 'doc' field and the 'help' field? They appear to mean the same thing.

e.g.

groups:
- neurodata_type_def: Transcription
  neurodata_type_inc: NWBDataInterface
  name: transcription
  doc: holds tiers for different levels of transcription (e.g. sentence, word, phoneme)
  attributes:
  - name: help
    dtype: text
    doc: doc
    value: holds tiers for different levels of transcription

and

groups:
- neurodata_type_def: NWBFile
  neurodata_type_inc: NWBContainer
  name: root
  doc: Top level of NWB file.
  attributes:
  - name: help
    dtype: text
    doc: Value is 'an NWB:N file for storing cellular-based neurophysiology data'
    value: an NWB:N file for storing cellular-based neurophysiology data

The help attribute also has a doc field. What is that intended for?

oruebel commented 5 years ago

The fundamental difference is that doc is part of schema-language and help specifies actual data in a schema. More specifically:

bendichter commented 5 years ago

A few more points about related fields in NWB and how they relate to pynwb:

oruebel commented 5 years ago
* Since `help` is an attribute, it  requires a `doc` field (which feels a bit odd)

Any data field we specify in the schema should have a doc that describes its purpose.

* There is also `description` ...

The help vs. comment vs description issue seems to be a question of harmonization across types and documentation. We'd have to look at the different cases to see whether changing, e.g,. comment to description may make sense. I'm not sure how critical this issue is right now, and it also seems that this would likely be a non-backward-compatible change.

* There is also a required `doc` field in `dovcal` ...

Here we need to distinguish between API and schema features. docval is feature of the python API. While doc in the schema and the corresponding doc fields in docval should ideally be similar wherever possible, the purpose of the doc in the schema and in docval are different, in that one refers to how a parameter is stored and the other describes how it is used in memory. In many cases the two can often be the same, but the types of data representations that the API may expect can often differ compared to what the schema specifies for storage. I agree that we should strive for consistency between the two, but it seems to be a separate issue that is specific to PyNWB and that could be easily fixed by updating doc fields in docval to be consistent with the schema where they are not, without having to changing anything in the schema.

rly commented 5 years ago

help on the other hand is a custom attribute that is being specified as part of the schema and is being saved as an attribute to file.

What is the value of having help as an attribute that is saved to file? I think it is redundant and confusing.

help.value, where help exists, is almost always a shortened version of the higher-level doc field.

Also, help is intended never to be changed, help.dtype should always be type text, and help.doc should always effectively be Value is 'xx' where 'xx' is just the value. Within pynwb, you can neither access nor change the help field. It appears to exist just to be included as a string attribute for a data type, for in-file documentation. In these ways, it is a special attribute that can easily be mis-used.

It does not help that we currently use help inconsistently in the core YAML files. In some places, help.doc is a help string; in others, help.doc is Value is 'xx' where 'xx' is help.value (which is itself often a copy of the higher-level doc field); and in other places, there is no help.value and all the data is in help.doc... I would argue the intended usage of help is confusing and would be especially so for extension writers. I think it also clutters the schema.

In the interest of reducing redundancy in the schema and unintended usage of help, I propose removing the help field entirely. Although it could be useful as a documentation attribute written to file, there is more comprehensive documentation of the data type in the schema docs (help is limited to 50 chars), so I think removing it would not result in loss of information or much accessibility.

This is low-priority, but I figured I would bring it up while I noticed it.

bendichter commented 5 years ago

One thing to consider is backwards compatibility. If we remove help entirely we may end up breaking read on previously written files. We could make it optional and remove it in most cases without breaking backwards compatibility.

tjd2002 commented 5 years ago

Until this thread I never understood the function of 'help', and I don't think it impacted my use of NWB or pyNWB at all.

@rly

I would argue the intended usage of help is confusing and would be especially so for extension writers. I think it also clutters the schema.

Strong agree!

In writing an extension, filling out the 'help' attribute was for us just an exercise in boilerplate. (https://github.com/LorenFrankLab/franklab-nwb-hack/blob/master/hackathon-6/create_franklab_spec.ipynb)

In the generated .yaml file, 33 out of 124 lines are related to the help attribute: https://github.com/LorenFrankLab/franklab-nwb-hack/blob/master/hackathon-6/franklab.extensions.yaml

If the core and extension specs are going to be cached in files by default (as seems to be the plan, based on discussions at Hack06 and https://github.com/NeurodataWithoutBorders/pynwb/issues/497), then it seems that practically speaking the help attribute can be removed without loss of self-documentation (since the 'doc' field will be available in the .nwb files)

rly commented 5 years ago

One thing to consider is backwards compatibility. If we remove help entirely we may end up breaking read on previously written files. We could make it optional and remove it in most cases without breaking backwards compatibility.

Making help optional and discouraging its use would be a good first step. But, removing it seems possible. I tried removing help from parts of the core YAML (NWBContainer, NWBDataInterface, TimeSeries, ElectricalSeries) and was able to both read and validate an NWB file with an ElectricalSeries that has the help attribute. So, maybe I am missing something, but it seems like removing it altogether would not break read on previously written files.

Now, maybe the validator is supposed to fail there. I don't know if we decided on whether it should fail for having extra attributes not in the schema.

bendichter commented 5 years ago

As long as we are careful not to break read, I'm all for removing help. It has always been a nuisance for me

oruebel commented 5 years ago

Aside from makeing sure we can maintain backward compatibility with 2.0 I don't have any particular objections to removing help. I believe the intend was originally to enable users to "manually" read the file by looking at it in HDF5 tools. An alternative to removing it, may be (similar to neurodata_type) have help be a standard attribute that is automatically created from the doc field (as such it would not be part of the schema but become a reserved attributed for the HDF5 backend). Not sure if this really necessary (and I'm ok without it), but I figured I mention it as an option.

rly commented 5 years ago

I believe the intend was originally to enable users to "manually" read the file by looking at it in HDF5 tools. An alternative to removing it, may be (similar to neurodata_type) have help be a standard attribute that is automatically created from the doc field (as such it would not be part of the schema but become a reserved attributed for the HDF5 backend)

I like that alternative, but am not sure there is a need for it, especially now that the schema is cached by default. I'm slightly more in favor of outright removal, just because the alternative requires implementing a special case, which I think won't really be used. What do others think?

oruebel commented 5 years ago

I like that alternative, but am not sure there.... I'm slightly more in favor of outright removal,

That's fine with me, as long as we don't cause any big issues with backward compatibility.