Add support for reading NWBv2 and execute the pipeline on them

t-b commented 5 years ago

Inherit from EphysDataset, AibsDataset is an example which reads MIES data
Add tests a la test_ephys_data_set.py
[x] EphysDataSet should be NWb agnostic
[x] Move Sweep instance creation into NWB reader

Stick with python 2.7 for now.

The full pipeline can be run with run_pipeline.py.

dyf commented 5 years ago

cc @sgratiy. @t-b is going to take a stab at this, questions will probably be heading our way.

t-b commented 5 years ago

Where can I get some NWBv1 sample files for testing with run_pipeline.py?

t-b commented 5 years ago

[x] Inherit from NwbReader to read NWB v2 files using pynwb
[x] Extend create_nwb_reader to use this new class for v2 files, use the same logic as depicted in https://github.com/AllenInstitute/IPNWB/commit/6d8d504cee29b83c8307f7f2ceb1262e49e4a559 to check for NWB v2 files
[ ] Inherit from EphysDataSet and create AibsForeignDataset

Progress can be tracked at https://github.com/t-b/ipfx/tree/add-AibsForeignDataset.

sgratiy commented 5 years ago

I have sample files. What would be the preferred way for sharing them with you?

t-b commented 5 years ago

Either dropbox (brain grant account) or slack or I can give you FTP credentials.

sgratiy commented 5 years ago

Could you please provide details for how to use "brain grant account" drop box?

t-b commented 5 years ago

Done. Pinged you on slack.

sgratiy commented 5 years ago

Perhaps a better idea is to used public files e.g. use AllenSDK to download a cell https://ipfx.readthedocs.io/en/latest/auto_examples/qc_examples/sweep_qc.html#sphx-glr-auto-examples-qc-examples-sweep-qc-py

t-b commented 5 years ago

Thanks I'll give that a shot.

t-b commented 5 years ago

@sgratiy @dyf

In https://github.com/AllenInstitute/ipfx/blob/659378e0d5ae28ee0b7f8d6fe2ba124e2d603b7b/ipfx/aibs_data_set.py#L85-L90 the entry stimulus_code_ext is added to the returned dict which holds the stimulus code with the "Set Sweep Count" (aka the sweep number of a stimset with multiple sweeps).

Looking into the repo with

$ git grep stimulus_code_ext
ipfx/aibs_data_set.py:            sweep_record["stimulus_code_ext"] = stim_code_ext
tests/test_ephys_data_set.py:    'stimulus_code_ext': ['C1LSFINEST150112[0]','EXTPBREAKN141203[0]','EXTPBREAKN141203[0]'],

it looks like that this is not used.

Is that info needed? Or was it just easy to query and therefore added? I can add that as well to my child class of EphysDataSet, as this should be in the ABF/DAT files.

t-b commented 5 years ago

@sgratiy @dyf

If I'm doing python ipfx/bin/run_pipeline.py --input_nwb_file 595570553.nwb --output_nwb output.nwb on the master branch I'm getting

WARNING:root:many=True not supported from argparse
Traceback (most recent call last):
  File "ipfx/bin/run_pipeline.py", line 101, in <module>
    if __name__ == "__main__": main()
  File "ipfx/bin/run_pipeline.py", line 86, in main
    module = ags.ArgSchemaParser(schema_type=PipelineParameters)
  File "C:\Anaconda2\lib\site-packages\argschema\argschema_parser.py", line 175, in __init__
    result = self.load_schema_with_defaults(self.schema, args)
  File "C:\Anaconda2\lib\site-packages\argschema\argschema_parser.py", line 274, in load_schema_with_defaults
    result = utils.load(schema, args)
  File "C:\Anaconda2\lib\site-packages\argschema\utils.py", line 422, in load
    raise mm.ValidationError(errors)
marshmallow.exceptions.ValidationError: {'qc_criteria': [u'Missing data for required field.']}

Any idea what I'm doing wrong?

dyf commented 5 years ago

re: stimulus_code_ext -- it's not used anywhere. It was originally a requirement in a several-years-ago version of the pipeline, so we kept it there for backwards compatibility. It's not necessary for your files.

dyf commented 5 years ago

re: qc_criteria -- there is a json file containing the qc criteria:

https://github.com/AllenInstitute/ipfx/blob/master/ipfx/qc_criteria.json

This constant knows the path:

https://github.com/AllenInstitute/ipfx/blob/master/ipfx/qc_protocol.py#L6

Would you mind updating run_pipeline.py to load those QC criteria by default if they are not supplied? You can make it optional here:

https://github.com/AllenInstitute/ipfx/blob/master/ipfx/bin/run_pipeline.py#L93

And then load the DEFAULT_QC_CRITERIA_FILE if None here:

https://github.com/AllenInstitute/ipfx/blob/master/ipfx/bin/run_qc.py#L33

sgratiy commented 5 years ago

You would get the default qc_criteria when running scripts for generating the pipeline input. To generate the pipeline input json: https://github.com/AllenInstitute/ipfx/blob/659378e0d5ae28ee0b7f8d6fe2ba124e2d603b7b/ipfx/bin/generate_pipeline_input.py#L120 and then

python run_pipeline_extraction.py --input_json INPUT_JSON --output_json OUTPUT_JSON

Or there is a script that generates the pipeline input and runs the pipeline from the nwb file: https://github.com/AllenInstitute/ipfx/blob/master/ipfx/bin/pipeline_from_nwb_file.py:

python pipeline_from_nwb_file.py INPUT_NWB_FILE Make sure to change the output cell directory in the script https://github.com/AllenInstitute/ipfx/blob/659378e0d5ae28ee0b7f8d6fe2ba124e2d603b7b/ipfx/bin/pipeline_from_nwb_file.py#L8

Finally, there is a shell script doing the same thing: pipeline_from_nwb.sh

t-b commented 5 years ago

@sgratiy Regarding #124. When I call the pipeline_from_nwb_file.py script I get

$ python pipeline_from_nwb_file.py ../../tests/595570553.nwb
Traceback (most recent call last):
  File "pipeline_from_nwb_file.py", line 5, in <module>
    import generate_pipeline_input as gpi
  File "E:\projekte\allensdk.ipfx\ipfx\bin\generate_pipeline_input.py", line 6, in <module>
    from generate_se_input import generate_se_input
  File "E:\projekte\allensdk.ipfx\ipfx\bin\generate_se_input.py", line 5, in <module>
    import make_stimulus_ontology as mso
  File "E:\projekte\allensdk.ipfx\ipfx\bin\make_stimulus_ontology.py", line 4, in <module>
    import lims_queries as lq
  File "E:\projekte\allensdk.ipfx\ipfx\bin\lims_queries.py", line 2, in <module>
    import allensdk.internal.core.lims_utilities as lu
ImportError: No module named internal.core.lims_utilities

but I have all packages from requirements.txt installed. Any ideas what I'm doing wrong?

sgratiy commented 5 years ago

This happens because we call make_stimulus_ontology_from_lims() when generating pipeline input. That function uses an internal package. This package was supposed to be used only internally and thus not listed in the requirements. Could you install allensdk_internal for now. We need a better solution for this going forward, I agree.

AllenInstitute / ipfx

Add support for reading NWBv2 and execute the pipeline on them #107