AllenInstitute / AllenSDK

code for reading and processing Allen Institute for Brain Science data
https://allensdk.readthedocs.io/en/latest/
Other
340 stars 149 forks source link

Address issues with mFish, suite2p segmented data in the SDK #2464

Closed morriscb closed 2 years ago

morriscb commented 2 years ago

Got a email from Marina regarding an issue with loading files.

Hey Chris, I can see that new outputs have been generated for the experiments you re-ran, but I am unable to load the data via the SDK. I am getting some key error (full error message in screenshots below) saying that the key ‘filepath’ is incorrect.

I noticed that there is a new roi_traces.h5 file in the ‘processed’ directory in lims for this specific experiment, and that there is also an roi_traces.h5 in the main experiment directory. This makes me wonder whether the output files are being saved to a different location such that the lims well known file is pointing to the wrong thing? Or some other such thing that could cause lims to not be able to find some file. Just a wild guess.

On the bright side, when I load one of the new roi_traces.h5 or experiment_dff.h5 files that was generated yesterday, I can see that there are >60 traces, which means there were plenty of ROIs segmented, which is a big step forward! When I plot the traces from the manually loaded file, some of them look normal-ish, but many of them look like noise, which is probably because we haven’t applied a classifier yet (which is fine at this point).

Heres an example of a trace from manually loaded file: image.png

morriscb commented 2 years ago

Scott's comment is that this is due to the SDK expecting data products from legacy segmentation to be present. As we aren't running these legacy pipelines for these data, the failures are not unexpected. A meeting with Marina to get requirements for running and what is "done" will help. #2458

The line of code that is actually failing for Marina is failing on this query

                SELECT
                    wkf.storage_directory || wkf.filename AS filepath,
                    wkft.name as wkfn
                FROM ophys_experiments oe
                JOIN ophys_cell_segmentation_runs ocsr
                ON ocsr.ophys_experiment_id = oe.id
                JOIN well_known_files wkf ON wkf.attachable_id = ocsr.id
                JOIN well_known_file_types wkft
                ON wkft.id = wkf.well_known_file_type_id
                WHERE ocsr.current = 't'
                AND wkf.attachable_type = 'OphysCellSegmentationRun'
                AND wkft.name IN ('OphysMaxIntImage',
                    'OphysAverageIntensityProjectionImage')
                AND oe.id = {};

Those OphysMaxIntImage, OphysAverageIntensityProjectionImage reek of legacy motion correction/segmentation to me.

So, I think the actual problem is that trying to load a BehaviorSession with the SDK involves trying to load the entire legacy data model as it existed circa March 2021, whereas what we have been trying to support is “just get me some modern traces and worry about the rest later.” Unless that is not what we are trying to support right now and we really do need to get every data structure for learning-mFISH in a state such that the SDK can load it.

This is probably something we need to talk about (defining “doneness” for this scope of work). I seem to recall Chris suggesting such a conversation somewhere in this thread, but I may be hallucinating that.

Cheers,

morriscb commented 2 years ago

After chatting to Marina, she said she would send me the notebook she used to access the SDK so I can test loading the data and any modifications we make to it.

morriscb commented 2 years ago

I terms of fixing this issue, the plan currently is to modify the LIMS query to attempt to load the suite2p motion correction projection images first and then failing that default to loading the legacy segmentation projection images. mFish will currently work on this as a branch and if we get the okay from stake holders, merge this into a future verion of the SDK.

morriscb commented 2 years ago

Issues fixed by updating LIMS to write up both the average and max image in the suite2p motion correction queue and adding the query for those images to LIMS.