Closed bendichter closed 6 years ago
Yes in general having a shape checker is a good thing to have.
I think in your case here you would even need to enforce a specific 2D array size or? If I read the spec correctly it says that you need num_clusters
rows and num_samples
columns.
need
num_clusters
rows andnum_samples
columns.
From the specification language perspective the dims
field is currently just naming/labeling dimensions, i.e., we currently don't yet have a way to express constraints that apply across datasets, e.g., to define that waveform_sd
and waveform_mean
must have the same shape. We have had discussion about how to express advanced constraints like this but so far this is still in the discussion phase.
I suggest we add an optional dataset shape checker to docval.
In addition to shape I think you also want to have a way to restrict the data type for values in the array. Generally speaking the idea of allowing docval to validate general properties of arrays sounds reasonable to me.
The docval would take either a list of lists or an
np.ndarray
...
The shorthand for valid array types for docval is here: https://github.com/NeurodataWithoutBorders/pynwb/blob/91676cc29bf22c284db63379a676136d64827cab/src/pynwb/form/utils.py#L10
There is also a ShapeValidator
class in FORM that might be helpful, or at least would be a good place to put the functionality needed to inspect and validate shapes of arrays.
@oruebel cool, thanks for pointing this out to me. However, importing ShapeValidator
in form/utils causes a circular import. Any suggestions how we should resolve that?
I assume the circular import is due to docval. I think the cleanest solution here is to: 1) not use docval in ShapeValidator
and to simply move the information to the docstring, 2) transplant ShapeValidator, ShapeValidatorResult, __get_shape_helper, get_shape
into a new .py file. pynwb/form/validate/shapevalidator.py
seems like it might be a good place. @ajtritt does that sound reasonable?
OK. While I'm at it I'm de-classing ShapeValidator
. It has no __init__
and only static methods so I see no reason why it should be a class.
I separated ShapeValidator but I'm running into more import errors:
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
<ipython-input-1-88365b6446cb> in <module>()
----> 1 from pynwb import NWBHDF5IO
~/dev/pynwb/src/pynwb/__init__.py in <module>()
8 CORE_NAMESPACE = 'core'
9
---> 10 from .form.spec import NamespaceCatalog # noqa: E402
11 from .form.utils import docval, getargs, popargs, call_docval_func # noqa: E402
12 from .form.backends.io import FORMIO # noqa: E402
~/dev/pynwb/src/pynwb/form/__init__.py in <module>()
1 # flake8: noqa: F401
----> 2 from .container import Container, Data, DataRegion
3 from .utils import docval, getargs
4 from .data_utils import ListSlicer
5 from .backends.hdf5.h5_utils import H5RegionSlicer, H5Dataset
~/dev/pynwb/src/pynwb/form/container.py in <module>()
1 import abc
2 from six import with_metaclass
----> 3 from .utils import docval, getargs
4
5
~/dev/pynwb/src/pynwb/form/utils.py in <module>()
8 from six import raise_from
9
---> 10 from .validate.shapevalidator import get_data_shape
11
12 __macros = {
~/dev/pynwb/src/pynwb/form/validate/__init__.py in <module>()
1 # flake8: noqa: F401
----> 2 from . import errors
3
4 from .validator import ValidatorMap, Validator, AttributeValidator, DatasetValidator, GroupValidator
5 from .errors import *
~/dev/pynwb/src/pynwb/form/validate/errors.py in <module>()
1
----> 2 from ..spec.spec import DtypeHelper
3 from numpy import dtype
4
5
~/dev/pynwb/src/pynwb/form/spec/__init__.py in <module>()
2 import os.path
3
----> 4 from .spec import NAME_WILDCARD
5 from .spec import Spec
6 from .spec import AttributeSpec
~/dev/pynwb/src/pynwb/form/spec/spec.py in <module>()
5 from warnings import warn
6
----> 7 from ..utils import docval, getargs, popargs, get_docval, fmt_docval_args
8
9 NAME_WILDCARD = None
ImportError: cannot import name 'docval'
I'm not sure the most efficient way to re-organize in order to fix this cyclic import. docval
is used quite extensively in spec.spec
Did you remove docval from shape validator? Are you making changes in spec.spec
or this is a result of the refactor?
I did remove docval from shape validator. I have not made changes to spec.spec yet. I could if that's the best approach, but this is turning into a lot of docval -> docstring. I'm fine with doing that if it's the best approach, but I didn't want the PR to be a shock.
https://github.com/NeurodataWithoutBorders/pynwb/tree/shape_checker
Thanks for the pointer to the branch. It just wasn't clear to me what changes you made, i.e., I was not suggesting to change spec.spec . My schedule is crazy today so I don't have the chance to investigate in detail. From a quick look, I'd try to move from ..data_utils import DataChunkIterator
insrc/pynwb/form/validate/shapevalidator.py
to make it a function-level import in get_data_shape
which seems to be the only place where it is being used. At least my guess is that your picking up the circular import from there.
actually I putting shapevalidator.py
in src/pynwb/form/validate
might be causing an issue. It runs src/pynwb/form/validate/__init__.py
, which attempts to import form.utils
. I might be able to resolve this by moving shapevalidator.py
somewhere else.
@ajtritt do you thing pynwb/form/utils.py
would be the right place for the shape validator?
If ShapeValidator is going to be used with docval, it should go in form.utils
@ajtritt
get_data_shape
depends on DataChunkIterator
and DataChunkIterator
depends on get_data_shape
, so moving just get_data_shape
will cause an import error. Should I move DataChunkIterator
as well?
I don't think DataChunkIterator
should move. get_data_shape
really just usesDataChunkIterator
to check for type. in get_data_shape
I think we can simply change the following code:
if isinstance(data, DataChunkIterator):
return data.maxshape
if hasattr(data, 'shape'):
return data.shape
to
if hasattr(data, 'shape'):
return data.shape
if hasattr(data, 'maxshape'):
return data.maxshape
That should remove the dependence of the function on DataChunkIterator.
ok, good idea.
On Thu, Sep 20, 2018 at 12:32 PM Oliver Ruebel notifications@github.com wrote:
I don't think DataChunkIterator should move. get_data_shape really just usesDataChunkIterator to check for type. in get_data_shape I think we can simply change the following code:
if isinstance(data, DataChunkIterator): return data.maxshape if hasattr(data, 'shape'): return data.shape
to
if hasattr(data, 'shape'): return data.shape if hasattr(data, 'maxshape'): return data.maxshape
That should remove the dependence of the function on DataChunkIterator.
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/NeurodataWithoutBorders/pynwb/issues/619#issuecomment-423248949, or mute the thread https://github.com/notifications/unsubscribe-auth/AAziEusBFIsR2Fafk0GB7l50rnnbXeAaks5uc8MHgaJpZM4Ws0HB .
Ben Dichter, PhD Data Science Consultant
@ajtritt mentioned that getting the validator to run on write would solve this problem as well. That's a simpler solution, since it would not involve manually adding shape args to every dataset type. I still think this is valuable, but I agree with Andrew that the validator would solve this problem more quickly.
pynwb does not enforce that the shape of datasets matches the shape specification in the schema. matnwb expects datasets to follow the size rules specified by the schema, so this is causing a compatibility issue for files that are written in python and read in matlab. For example, it is possible to write a 1-D dataset to the variable
ecephys.
WaveformClusters.waveform_mean
, even though the schema requires that the dataset be 2-D. This dataset will then be written with no problem in pywnb, but will cause a read error in matnwb.I suggest we add an optional dataset shape checker to docval. The syntax would be:
The docval would take either a list of lists or an
np.ndarray
, and would enforce that the dataset is 2-D.@ajtritt, what do you think about this?