AllenInstitute / ophys_etl_pipelines

Pipelines and modules for processing optical physiology data
Other
9 stars 5 forks source link

Investigate problems with Mesoscope splitting #257

Closed njmei closed 3 years ago

njmei commented 3 years ago

Based on initial investigations in #214, it appears that some ophys_sessions for Mesoscope are not properly processed by the mesoscope splitting module.

Based on v2 debug *.pngs found in /allen/scratch/aibstemp/nicholasm/investigate_mesoscope_file_splitting it's clear that prior to Suite2P motion correction, it's possible to get valid looking avg and max projection images even when mesoscope splitting has severely messed up.

Furthermore as outlined in #256, there are cases where mesoscope splitting appears to have worked properly but the max projection image after Suite2P motion correction looks abberant.

Thus it's unclear whether abberant max projection images after Suite2P motion correction is always indicative of a mesoscope file splitting issue. It's possible that there may be split ophys_experiments with normal looking max projection images after Suite2P motion correction where splitting was done incorrectly.

Module in question: https://github.com/AllenInstitute/ophys_etl_pipelines/tree/main/src/ophys_etl/modules/mesoscope_splitting

Tasks:

Validation:

wbwakeman commented 3 years ago

Locations of expected good files:

Session id: 1108352403
Rig: MESO.1
Run date: 6/9/2021
Ephemeral files path: /allen/aibs/informatics/waynew/mesoscope/1108352403*
Production directory: /allen/programs/braintv/production/visualbehavior/prod5/specimen_1076746662/ophys_session_1108352403/
Project: VisualBehaviorMultiscope4areasx2d
Stim: OPHYS_6_images_H
Operator: Sam
Mouse: 563231
Imaging scheme: 4 areas x 2 depths

Session id: 1053297451
Rig: MESO.1
Run date: 9/28/2020
Ephemeral files path: /allen/aibs/informatics/waynew/mesoscope/1053297451*
Production directory: /allen/programs/braintv/production/neuralcoding/prod64/specimen_1029485863/ophys_session_1053297451/
Project: MultiscopeSignalNoise
Stim:  SNR_8
Operator: sara.kivikas
Mouse: 530683
Imaging scheme: 4 areas x 2 depths

Session id: 1081677132
Rig: MESO.1
Run date: 2/5/2021
Ephemeral files path: /allen/aibs/informatics/waynew/mesoscope/1081677132*
Production directory: /allen/programs/braintv/production/visualbehavior/prod5/specimen_1053724035/ophys_session_1081677132/
Project: VisualBehaviorMultiscope
Stim: OPHYS_3_images_A
Operator: Sam
Mouse: 547266
Imaging scheme: 2 areas x 4 depths

Sessions in this issue description:

Session id: 1099322836 
Rig: MESO.1
Run date: 4/27/2021
Ephemeral files directory: /allen/aibs/informatics/waynew/mesoscope//1099322836*
Production directory: /allen/programs/braintv/production/visualbehavior/prod5/specimen_1064430144/ophys_session_1099322836/
Project: VisualBehaviorMultiscope4areasx2d
Stim: OPHYS_7_receptive_field_mapping
Operator: Sam
Mouse: 554115
Imaging scheme: 4 areas x 2 depths

Session id: 1107373623
Rig: MESO.1
Run date: 6/4/2021
Ephemeral files path: /allen/aibs/informatics/waynew/mesoscope/1107373623*
Production directory: /allen/programs/braintv/production/visualbehavior/prod5/specimen_1076746662/ophys_session_1107373623/
Project: VisualBehaviorMultiscope4areasx2d
Stim: OPHYS_6_images_H
Operator: Sam
Mouse: 563231
Imaging scheme: 4 areas x 2 depths

A test session that may be useful because it is much smaller:

Session id: 1105802389
Rig: MESO.2
Run date: 5/27/2021
Ephemeral files path: /allen/aibs/informatics/waynew/mesoscope/1105802389*
Production directory: /allen/programs/braintv/production/neuralcoding/prod0/specimen_703198163/ophys_session_1105802389/
Project: MesoscopeDevelopment
Stim: n/a
Operator: ??
Mouse: no mouse for this WSE test session
Imaging scheme: 4 areas x 2 depths

Another session

Session id: 1082487233
Rig: MESO.1
Run date: 2/9/2021
Ephemeral files path: /allen/aibs/informatics/waynew/mesoscope/1082487233*
Production directory: /allen/programs/braintv/production/visualbehavior/prod5/specimen_1052103219/ophys_session_1082487233/
Project: VisualBehaviorMultiscope
Stim: OPHYS_6_images_B
Operator: ??
Mouse: 546605
Imaging scheme: 2 areas x 4 depths
danielsf commented 3 years ago

Ophys experiment 1109580767, whose data products can be found here

/allen/programs/braintv/production/visualbehavior/prod6/specimen_1091244055/ophys_session_1109456332/ophys_experiment_1109580767

failed CELL_ROI_CREATION because every frame was motion corrected more than 30 pixels in x. This may be a related problem.

UPDATE: I ran this session through Nick's script to plot the first 11 frames of each experiment in the session. There was no obvious splitting issue. This may have been a red herring.

samiamseid commented 3 years ago

I also commented in #256 but wanted to say this here as well: I don't believe there is any splitting issue related to experiment 1109580767. Rather, what i see is that the motion correction output seems to have centered around 100 pixels away from where it should be. motcor

samiamseid commented 3 years ago

For context, this plots x axis should be much closer to zero. For some reason, the motion correction algorithm decides to shift almost 100 pixels in x and y for every frame of the video

djkapner commented 3 years ago

A clue: these "bad" sessions have repeated z plane values.

import numpy as np
from pathlib import Path
from ophys_etl.modules.mesoscope_splitting.tiff import MesoscopeTiff

basedir = Path("/allen/aibs/informatics/waynew/mesoscope/")

# good files
good_session_ids = [1108352403, 1053297451, 1081677132]

# bad files
bad_session_ids = [1099322836, 1107373623]

for session_id in good_session_ids + bad_session_ids:
    fname = basedir / f"{session_id}_timeseries.tiff"
    m = MesoscopeTiff(fname)
    print(m.plane_scans.size, np.unique(m.plane_scans).size, m.plane_stride)
8 8 8
8 8 8
8 8 8
8 7 7
8 7 7

@danielsf reminded me of this discussion with @nataliaorlova :

This is a human error, operators are instructed to never have two planes at exactly the same depth.

I am confirming with her that this is still true. In which case, we need to put in a check to fail the splitting job with an informative error to catch when this happens.

djkapner commented 3 years ago

@nataliaorlova No, these 2 problem sessions are not from the same specimen.

djkapner commented 3 years ago

@wbwakeman I think this is now resolved with #271 merged. But, there are a lot of threads to other ongoing issues. I'll leave it to you to mark as done or not.

djkapner commented 3 years ago

One can't run this check when the timeseries.tiff has been archived. But, the timeseries tiff metadata is included in the output json of the jobs. The code that follows (mostly copy/paste out of the MesoscopeTiff object) parses those output jsons and performs a similar check. These ophys_session_ids are seen to have this problem:

[1096419894. 1099322836, 1107373623, 1011593885, 962045676]

bold IDs are newly discovered by this scan.

import re
import json
import numpy as np
from datetime import datetime
from pathlib import Path
from typing import List, Tuple
import lims.query_utils as qu

from ophys_etl.modules.mesoscope_splitting.tiff import flatten_list

def splitting_output_json_path(ophys_session_id: int) -> Path:
    qstring = f"""SELECT storage_directory
                  FROM ophys_sessions WHERE id = {ophys_session_id}"""
    sdir = Path(
            conn.query(
                qstring.format(ophys_session_id))[0]["storage_directory"])
    basename_template  = \
            f"*MESOSCOPE_FILE_SPLITTING_QUEUE_{ophys_session_id}_output.json"
    files = list(sdir.rglob(basename_template))
    if len(files) == 0:
        return None
    if len(files) == 1:
        output_path = files[0]
    else:
        dates = []
        for filename in files:
            dstr = re.findall("\d+_MESOSCOPE", str(filename))[0].split("_")[0]
            dates.append(datetime.strptime(dstr, "%Y%m%d%H%M%S"))
        index = dates.index(max(dates))
        output_path = files[index]
    return output_path

def zs_channels(output_json_path: Path):
    with open(output_json_path, "r") as f:
        j = json.load(f)
    frame_metadata = j['file_metadata'][-1]['scanimage_metadata']
    version = frame_metadata["SI"]["TIFF_FORMAT_VERSION"]
    if version == 3:
        zkey = "zs"
    else:
        zkey = "zs_v3_v4"
    zs = frame_metadata["SI"]["hStackManager"][zkey]
    channels = frame_metadata["SI"]["hChannels"]["channelsActive"]
    if isinstance(channels, list):
        res = []
        flatten_list(res, channels)
        channels = res
    else:
        channels = [channels]
    return zs, channels

def volume_scans(stack_zs, active_channels):
    vol_scns = np.array(stack_zs).T[np.array(active_channels) - 1]
    for id1, scn1 in enumerate(vol_scns):
        for id2, scn2 in enumerate(vol_scns):
            if id1 != id2 and all(scn1 == scn2):
                raise AttributeError(f"Volume scans {id1} and {id2} were "
                                     f"taken at the same z values. This "
                                     f"is a human error - operators are "
                                     f"instructed not to do this.")
    return vol_scns

creds = qu.get_db_credentials()
conn = qu.DbConnection(**creds)

qstring = """SELECT os.id FROM ophys_sessions AS os 
             JOIN equipment AS eq on eq.id=os.equipment_id
             WHERE eq.name='MESO.1'"""

ophys_session_ids = [int(i["id"]) for i in conn.query(qstring)]

#ophys_session_ids = [1108352403, 1053297451, 1081677132,
#                     1099322836, 1107373623,
#                     1111197126, 1111018150, 1110835610,
#                     1110609895, 1110070263, 1109910469, 1109681854,
#                     1090519668]

for ophys_session_id in ophys_session_ids:
    output_path = splitting_output_json_path(ophys_session_id)
    if output_path is None:
        continue
    zs, channels = zs_channels(output_path)
    vs = volume_scans(zs, channels)
    plane_scans = vs.T.flatten()
    repeat = np.unique(plane_scans).size != plane_scans.size
    if repeat:
        print(f"ophys session:{ophys_session_id} repeat: {repeat}")