AllenInstitute / segmentation-labeling-app

Data pipeline and UI for human labeling of putative ROIs from 2p cell segmentations
Other
0 stars 0 forks source link

Fix failing 'average' strategy on video downsampling #90

Open pickles-bread-and-butter opened 4 years ago

pickles-bread-and-butter commented 4 years ago

Overview

The average downsampling is currently failing on 2p videos due to improper indexing into h5 file. H5py notes in their repo they do not support indexing by numpy array so we in turn should not have this implemented.

ToDo

djkapner commented 4 years ago

You've probably seen it, but the downsampling function is tested for each strategy here: https://github.com/AllenInstitute/segmentation-labeling-app/blob/8fd2ab36d66d9c42d9591e3d918bfe5803b5e2ad/tests/transforms/test_array_utils.py#L296

But certainly, we'll need to add or modify the tests for the failure mode you've discovered.

djkapner commented 4 years ago

From this reference: http://docs.h5py.org/en/latest/high/dataset.html#fancy-indexing says NumPy boolean “mask” arrays can also be used to specify a selection.

So maybe the problem is not necessarily one of lists vs. numpy arrays, but the bins should be full-length boolean arrays instead of indices.

kschelonka commented 4 years ago

Thanks Dan. The presence of a bug typically indicates missing test coverage, so we should definitely update the tests you cited there to cover this case. @isaak-willett Where are the examples?

kschelonka commented 4 years ago

Also, please attach the error traceback and provide more info for this bug report. I can't provide feedback on whether removing numpy array indexing is the appropriate solution without it.

pickles-bread-and-butter commented 4 years ago

Stack Trace

image.png

Script to Produce Error

import slapp.transforms.transform_pipeline as transform_pipeline
import slapp.utils.query_utils as query_utils

transform_args = {
        "segmentation_run_id": 954,
        "artifact_basedir": '/allen/aibs/informatics/isaak/webm_transform'
        }

db_credentials = query_utils.get_labeling_db_credentials()
db_connection = query_utils.DbConnection(**db_credentials)

transform_pipe = transform_pipeline.TransformPipeline(transform_args)
transform_pipe.run(db_connection)

Notes:

As of 5/12/2020 I cannot get this to reproduce on master I'm going to icebox this for now.

djkapner commented 4 years ago

Here's an argschema tip that can throw you off sometimes. In your script you have:

transform_pipe = transform_pipeline.TransformPipeline(transform_args)

When not calling from the command line, one should probably do:

transform_pipe = transform_pipeline.TransformPipeline(transform_args, args=[])

Basically, it might pick up some command line args you don't want. code reference docs reference

pickles-bread-and-butter commented 4 years ago

Did not know this, thanks for posting that. If this ever gets taken in the future I think the next step is to run from command line and run through the script with the suggested changes. My thought is it comes from dtype of the indexing numpy array being passed, maybe we are not providing the same dtype in the tests as from the 31Hz video.

kschelonka commented 4 years ago

Note: This was iceboxed due to inability to reproduce, but could possibly be an issue.