NeurodataWithoutBorders / pynwb

A Python API for working with Neurodata stored in the NWB Format
https://pynwb.readthedocs.io
Other
175 stars 85 forks source link

Always cache spec? #967

Closed rly closed 5 years ago

rly commented 5 years ago

Is there a reason why we would not want to cache the spec when writing a file?

Advantages of always caching spec: fewer complex use cases and interactions to handle and tests to make.

Parallel issue on hdmf: https://github.com/hdmf-dev/hdmf/issues/76

oruebel commented 5 years ago

For NWB:N files I don't see a reason why you would not always want to cache the spec. The amount of data is minimal and it helps ensure that the file remains accessible.

The only use-case I could think of where you may not want to do this is if you want to write partial files, e.g., a single TimeSeries container, rather than a full NWB:N file. The question of partial files has come up in the context of integration with data management. However, even in this case, I think you are probably fine with caching the spec.

bendichter commented 5 years ago

The tests are considerable slower when caching the spec

On Wed, May 29, 2019, 5:36 PM Oliver Ruebel notifications@github.com wrote:

For NWB:N files I don't see a reason why you would not always want to cache the spec. The amount of data is minimal and it helps ensure that the file remains accessible.

The only use-case I could think of where you may not want to do this is if you want to write partial files, e.g., a single TimeSeries container, rather than a full NWB:N file. The question of partial files has come up in the context of integration with data management. However, even in this case, I think you are probably fine with caching the spec.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/NeurodataWithoutBorders/pynwb/issues/967?email_source=notifications&email_token=AAGOEETYINEULAEGJGNZQHTPX4OSLA5CNFSM4HQY3TDKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWRAKOY#issuecomment-497157435, or mute the thread https://github.com/notifications/unsubscribe-auth/AAGOEEWU2KY4VTIAQS7VSPDPX4OSLANCNFSM4HQY3TDA .

t-b commented 5 years ago

The tests are considerable slower when caching the spec

Which just means that writing the spec to the file takes a lot of time or?

bendichter commented 5 years ago

I'm not sure if it's writing or reading, just something I noticed. I think it would be worthwhile to profile this to see what the time penalty is.

t-b commented 5 years ago

Related: https://github.com/NeurodataWithoutBorders/pynwb/issues/497

rly commented 5 years ago

Would there be any issues in loading namespaces by default on read? If specs are cached by default, they should be read by default, right?

i.e. make load_namespaces=true the default when calling NWBHDF5IO(path, 'r', load_namespaces=true)

https://github.com/NeurodataWithoutBorders/pynwb/blob/0eea97e520e3a9a728c017ee734e756799defcf3/src/pynwb/__init__.py#L188-L230

bendichter commented 5 years ago

I think that would replace any custom classes that have already been imported and/or registered