AllenInstitute / ipfx

computes intrinsic cell features from intracellular electrophysiology data
https://ipfx.readthedocs.io/en/latest/
Other
27 stars 36 forks source link

Fix duration rounding to address #521 #522

Closed gouwens closed 3 years ago

gouwens commented 3 years ago

Thank you for contributing to the IPFX, your work and time will help to advance open science!

Overview:

Give a brief overview of the issue you are solving. Succinctly explain the GitHub issue you are addressing and the underlying problem of the ticket. The commit header and body should also include this message, for good commit messages see the contribution guidelines.

The feature vector extraction code sometimes produces vectors of different lengths that cannot be saved as an array in an H5 file. This happens because of differences in the floating point approximations of start and end times in different cells.

Addresses:

Add a link to the issue on Github board

Addresses issue #521

Type of Fix:

Solution:

Outline your solution to the previously described issue and underlying cause. This section should include a brief description of your proposed solution and how it addresses the cause of the ticket

Round the start and end values to the nearest millisecond before calculating the duration in spike-related feature vector calculations.

Changes:

Include a bulleted list or check box list of the implemented changes in brief, as well as the addition of supplementary materials (unit tests, integration tests, etc

Validation:

Describe how you have validated that your solution addresses the root cause of the ticket. What have you done to ensure that your addition is bug free and works as expected? Please provide specific instructions so we can reproduce and list any relevant details about your configuration

Example script reproducing issue now runs to completion, and new unit tests pass.

Screenshots:

Unit Tests:

Script to reproduce error and fix:

script that demonstrates error:

from ipfx.bin.run_feature_vector_extraction import run_feature_vector_extraction

specimen_id_list = [898949412, 950300190, 645380099]
file_list = {
    898949412: "/path/to/file/nwb2_Rbp4-Cre_KL100;Ai14-471971.03.02.01.nwb",
    950300190: "/path/to/file/nwb2_Rbp4-Cre_KL100;Ai14-487123.05.02.03.nwb",
    645380099: "/path/to/file/nwb2_Sim1-Cre_KJ18;Ai14-354911.06.01.01.nwb",
}
ontology = StimulusOntology(ju.read(StimulusOntology.DEFAULT_STIMULUS_ONTOLOGY_FILE))
data_source = "filesystem"

run_feature_vector_extraction(
    output_dir=".",
    data_source="filesystem",
    output_code="test",
    project="test",
    output_file_type="h5",
    sweep_qc_option="none",
    run_parallel=False,
    ap_window_length=0.003,
    include_failed_cells=False,
    ids=specimen_id_list,
    file_list=file_list
)

Currently produces:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-e7008fd82f41> in <module>
     10     include_failed_cells=False,
     11     ids=specimen_id_list,
---> 12     file_list=file_list
     13 )

/local1/repos/ipfx/ipfx/bin/run_feature_vector_extraction.py in run_feature_vector_extraction(output_dir, data_source, output_code, project, output_file_type, sweep_qc_option, include_failed_cells, run_parallel, ap_window_length, ids, file_list, **kwargs)
    316 
    317     if output_file_type == "h5":
--> 318         su.save_results_to_h5(used_ids, results_dict, output_dir, output_code)
    319     elif output_file_type == "npy":
    320         su.save_results_to_npy(used_ids, results_dict, output_dir, output_code)

/local1/repos/ipfx/ipfx/script_utils.py in save_results_to_h5(specimen_ids, results_dict, output_dir, output_code)
    314         data = results_dict[k]
    315         dset = h5_file.create_dataset(k, data.shape, dtype=data.dtype,
--> 316             compression="gzip")
    317         dset[...] = data
    318     dset = h5_file.create_dataset("ids", ids_arr.shape,

/local1/anaconda3/envs/py3/lib/python3.7/site-packages/h5py/_hl/group.py in create_dataset(self, name, shape, dtype, data, **kwds)
    134 
    135         with phil:
--> 136             dsid = dataset.make_new_dset(self, shape, dtype, data, **kwds)
    137             dset = dataset.Dataset(dsid)
    138             if name is not None:

/local1/anaconda3/envs/py3/lib/python3.7/site-packages/h5py/_hl/dataset.py in make_new_dset(parent, shape, dtype, data, chunks, compression, shuffle, fletcher32, maxshape, compression_opts, fillvalue, scaleoffset, track_times, external, track_order, dcpl)
    116         else:
    117             dtype = numpy.dtype(dtype)
--> 118         tid = h5t.py_create(dtype, logical=1)
    119 
    120     # Legacy

h5py/h5t.pyx in h5py.h5t.py_create()

h5py/h5t.pyx in h5py.h5t.py_create()

h5py/h5t.pyx in h5py.h5t.py_create()

TypeError: Object dtype dtype('O') has no native HDF5 equivalent

After fix produces: No message (runs to completion)

Configuration details:

Checklist

Notes:

Use this section to add anything you think worth mentioning to the reader of the issue

gouwens commented 3 years ago

Not sure what's going on with the failed test - that test (test_feature_vector_extraction History) fails on my machine on the current master branch with a different error (below), so I can't run it to know if anything's changed.

________________________ test_feature_vector_extraction ________________________

tmpdir_factory = TempdirFactory(_tmppath_factory=TempPathFactory(_given_basetemp=None, _trace=<pluggy._tracing.TagTracerSub object at 0x7f41e9d1d0d0>, _basetemp=PosixPath('/tmp/pytest-of-nathang/pytest-1')))

    def test_feature_vector_extraction(tmpdir_factory):

        temp_output_dir = str(tmpdir_factory.mktemp("feature_vector"))
        test_output_dir = TEST_OUTPUT_DIR

        features = [
            "first_ap_v",
            "first_ap_dv",
            "isi_shape",
            "psth",
            "inst_freq",
            "spiking_width",
            "spiking_peak_v",
            "spiking_fast_trough_v",
            "spiking_threshold_v",
            "spiking_upstroke_downstroke_ratio",
            "step_subthresh",
            "subthresh_norm",
            "subthresh_depol_norm",
            ]

        run_feature_vector_extraction(ids=[500844783, 509604672],
                                      output_dir=temp_output_dir,
                                      data_source="filesystem",
                                      output_code="TEMP",
                                      project=None,
                                      output_file_type="npy",
                                      sweep_qc_option="none",
                                      include_failed_cells=True,
                                      run_parallel=False,
                                      ap_window_length=0.003,
>                                     file_list=test_nwb2_files
                                      )

tests/test_run_feature_vector.py:50: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

output_dir = '/tmp/pytest-of-nathang/pytest-1/feature_vector0'
data_source = 'filesystem', output_code = 'TEMP', project = None
output_file_type = 'npy', sweep_qc_option = 'none', include_failed_cells = True
run_parallel = False, ap_window_length = 0.003, ids = [500844783, 509604672]
file_list = {500844783: '/local1/repos/ipfx/tests/data/Vip-IRES-Cre;Ai14(IVSCC)-226110.03.01.nwb', 509604672: '/local1/repos/ipfx/tests/data/Vip-IRES-Cre;Ai14(IVSCC)-236654.04.02.nwb'}
kwargs = {}, specimen_ids = [500844783, 509604672]
ontology = <ipfx.stimulus.StimulusOntology object at 0x7f41e6e158d0>
get_data_partial = functools.partial(<function data_for_specimen_id at 0x7f41e9cc8710>, sweep_qc_option='none', data_source='filesystem',...e;Ai14(IVSCC)-226110.03.01.nwb', 509604672: '/local1/repos/ipfx/tests/data/Vip-IRES-Cre;Ai14(IVSCC)-236654.04.02.nwb'})
results = <map object at 0x7f41e6e1c250>

    def run_feature_vector_extraction(
        output_dir,
        data_source,
        output_code,
        project,
        output_file_type,
        sweep_qc_option,
        include_failed_cells,
        run_parallel,
        ap_window_length,
        ids=None,
        file_list=None,
        **kwargs
    ):
        """
        Extract feature vector from a list of cells and save result to the output file(s)

        Parameters
        ----------
        output_dir : str
            see CollectFeatureVectorParameters input schema for details
        data_source : str
            see CollectFeatureVectorParameters input schema for details
        output_code: str
            see CollectFeatureVectorParameters input schema for details
        project : str
            see CollectFeatureVectorParameters input schema for details
        output_file_type : str
            see CollectFeatureVectorParameters input schema for details
        sweep_qc_option: str
            see CollectFeatureVectorParameters input schema for details
        include_failed_cells: bool
            see CollectFeatureVectorParameters input schema for details
        run_parallel: bool
            see CollectFeatureVectorParameters input schema for details
        ap_window_length: float
            see CollectFeatureVectorParameters input schema for details
        ids: int
            ids associated to each cell.
        file_list: list of str
            nwbfile names
        kwargs

        Returns
        -------

        """
        if ids is not None:
            specimen_ids = ids
        elif data_source == "lims":
            specimen_ids = lq.project_specimen_ids(project, passed_only=not include_failed_cells)
        else:
            logging.error("Must specify input file if data source is not LIMS")

        if output_file_type == "h5":
            # Check that we can access the specified file before processing everything
            h5_file = h5py.File(os.path.join(output_dir, "fv_{}.h5".format(output_code)))
            h5_file.close()

        ontology = StimulusOntology(ju.read(StimulusOntology.DEFAULT_STIMULUS_ONTOLOGY_FILE))

        logging.info("Number of specimens to process: {:d}".format(len(specimen_ids)))
        get_data_partial = partial(data_for_specimen_id,
                                   sweep_qc_option=sweep_qc_option,
                                   data_source=data_source,
                                   ontology=ontology,
                                   ap_window_length=ap_window_length,
                                   file_list=file_list)

        if run_parallel:
            pool = Pool()
            results = pool.map(get_data_partial, specimen_ids)
        else:
            results = map(get_data_partial, specimen_ids)

>       used_ids, results, error_set = su.filter_results(specimen_ids, results)
E       TypeError: cannot unpack non-iterable NoneType object

ipfx/bin/run_feature_vector_extraction.py:308: TypeError
gouwens commented 3 years ago

Figured out the issue on my machine was that I didn't really have the test files because I didn't have git-lfs installed and set up. The earlier failed test is because the reference test files exhibited the bug that's being fixed here (the feature vectors had fewer bins than they should have).