codeneuro / spikefinder

benchmarking challenge for spike detection in imaging data
MIT License
11 stars 3 forks source link

Preprocessing new TIFFs / ROIs into Spikefinder format #10

Open alexklibisz opened 7 years ago

alexklibisz commented 7 years ago

I posted a similar question on the gitter channel and @PTRRupprecht offered some tips (thanks!). Perhaps this is a better place to serialize the discussion in case others have the same question in the future. I'll preface by saying I'm fairly new to calcium imaging and completely new to spike inference. I am familiar with the Nuerofinder datasets and ROI identification/segmentation techniques.

My group would like to apply some of the Spikefinder models that were submitted and documented (here) to our own calcium imaging datasets. Our data format is identical to Neurofinder - we have a directory of 16-bit TIFFs with binary ROI masks. We're unsure of the exact steps necessary to preprocess them to work with Spikefinder models. Is there any documentation or code showing how to preprocess new TIFFs/ROIs to match the format used in the competition?

I believe the general steps we need to take are:

  1. compute the mean of the ROI pixels at each time step.
  2. transform this to the Δ F / F format.
  3. use the preprocess function from c2s to convert to 100Hz sampling rate.

I'm comfortable with steps 1 and 3, and I could likely implement step 2 as its described here, but I'm not sure if that's even the same procedure as was followed for Spikefinder.

Thanks in advance for any tips, documentation, or code examples.

philippberens commented 7 years ago

This is indeed the general outline.

No the two challenges are currently not linked in this way.

alexklibisz commented 7 years ago

It is possible in theory though to convert the TIFFs/ROIs into a format where you can run the spikefinder algorithms, correct?

Is there a recommended existing implementation for the second step (computing Δ F / F)?

PTRRupprecht commented 7 years ago

@alexklibisz The c2s function also normalizes your calcium trace, so you can basically leave out step 2 ...

... unless you have a baseline drift due to instability of the recording, especially in a behaving animal. There isn't a single best practice for computing ΔF/F0 - it depends on your experiment and the recording artifacts that you have. The recipe that you mentioned earlier is maybe not what most people do, especially the convolution with an exponential filter is giving away important information in the traces. I think that it would be best practice to subtract a time-dependent baseline (this could be a moving 5%-quantile filter window, with the window size determined heuristically, mainly depending on the timescale of the baseline drift).

alexklibisz commented 7 years ago

@PTRRupprecht thank you. Just to clarify, by baseline drift, do you just mean a general downward trend in the ROI values over time?

In case anyone in the future might need it, below is my very amateur implementation of the steps for computing Δ F/F(t) from this figure for a Neurofinder TIFF/ROI. Certainly not guaranteed to be correct.

# Attempted calcium trace extraction from a neurofinder dataset.
# Assuming the dataset was downloaded from neurofinder.codeneuro.org and unzipped.
def get_mean_trace(dsdir, neuron_idx, img_preprocess=lambda x: x):
    """Compute the mean pixel value within an ROI at each time step for a Neurofinder dataset.

    # Arguments
        dsdir: directory containing the unzipped Neurofinder dataset.
        neuron_idx: the index of the neuron (ROI) from which to extract the trace.
        img_preprocess: an optional function called on each image.
    # Returns
        mean: an array of mean ROI pixel values, one value per image in the dataset.
        hz: the sampling rate for this dataset (e.g. 7 frames/second)      
    """
    info = json.load(open('%s/info.json' % dsdir))
    regs = json.load(open('%s/regions/regions.json' % dsdir))
    n = regs[neuron_idx]
    xx, yy = [x for y,x in n['coordinates']], [y for y,x in n['coordinates']]
    img_paths = sorted(glob('%s/images/*.tiff' % dsdir))
    mean = np.zeros(len(img_paths))
    for idx, ip in enumerate(img_paths):
        img = img_preprocess(tiff.imread(ip))
        mean[idx] = np.mean(img[yy,xx])
    return mean, info['rate-hz']

def delta_F_over_F(F, src_hz, tgt_hz=30, tau0=0.2, tau1=0.75, tau2=0.3):
    """Calcium trace computation according to https://goo.gl/sguz6a.
    Integrals (area under curve) calculated as a summation because the true
    fluorescence function is unknown.

    TODO: it's possible I'm computing the integrals incorrectly because I'm assuming a step
        size (width of the "rectangle") of 1 between each pair of points.

    # Arguments
        F: array of mean-values for a single ROI at each of its timestep
        src_hz: fps of the given F array.
        tgt_hz: fps of the target calcium trace.. prescribed by figure from Nature.
        tau0: fraction of seconds according to figure from Nature. Used for noise filtering.
        tau1: fraction of seconds according to figure from Nature. Used for the time-dependent baseline F_0(t).
        tau2: fraction of seconds according to figure from Nature. Used for the time-dependent baseline F_0(t).
            time-dependent baseline F_0(t).

    # Returns
        F_0: time dependent baseline.
        R: relative change of fluorescence from F(t) to F_0(t).
        dFF: noise-filtered fluorescence trace.

    """
    from scipy.signal import resample

    # Convert Tau values from seconds to units of samples based on target hz.
    tau0 = int(tgt_hz * tau0)
    tau1 = int(tgt_hz * tau1)
    tau2 = int(tgt_hz * tau2)

    # Re-sample to target hz (e.g. convert 7hz source to 30hz target).
    F = np.array(F)
    tgt_len = int(len(F) / src_hz) * tgt_hz
    F = resample(F, tgt_len)

    # Pad F with reflection on both ends for (tau1/2) elements.
    d = int(tau1/2)
    F = np.pad(F, ((d,d)), mode='reflect')

    # Calculate time dependent baseline F_0.
    smooth = lambda F, x: (1./tau1) * np.sum(F[max(0, x-int(tau1/2)):min(len(F) - 1, x+int(tau1/2))])
    F_0 = np.array([min([smooth(F, x) for x in range(t - tau2, t)]) for t in range(d, len(F) - d)])

    # Calculate the relative change of fluorescence signal R(t) from F(t) and F_0(t).
    R = np.array([(F[t] - F_0[t]) / F_0[t] for t in range(len(F_0))])

    # Apply noise filtering (exponentially weighted moving average) to get final delta F / F(t).
    w = [np.exp(-abs(tau) / tau0) for tau in range(0, len(F))]
    nmr = lambda t: np.sum([R[t - tau] * w[tau] for tau in range(0, t)])
    dnm = lambda t: np.sum([w[tau] for tau in range(0, t)])
    dFF = np.array([nmr(t) / (dnm(t) + 1e-7) for t in range(len(F_0))])

    return F[d:-d], F_0, R, dFF

mean, hz = get_mean_trace(dsdir='%s/neurofinder.00.01/' % DOWNLOADS_DIR, neuron_idx=10)
%time F, F_0, R, dff = delta_F_over_F(mean, src_hz=hz)
PTRRupprecht commented 7 years ago

@alexklibisz "by baseline drift, do you just mean a general downward trend in the ROI values over time?" - Yes, although it could also be an upwards trend (e.g. neurons slowly dying and Calcium being released from the intracellular buffers).

The implementation looks alright to me. However, you should rather set tau2 = 3 sec instead of 0.3 sec. Even 3 sec might be not long enough in some cases. For example, if you have a cell that continually fires over 10 sec, then this continuous firing would be subtracted by means of the baseline subtraction. This depends on the neurons you are looking at - some fire like crazy at all times, others only very sparsely in time.

And two remarks:

1) As mentioned before, I would not use the exponential filtering in practice. It is nice for visualization of noisy data, but for data analysis (like spike-finding) you are discarding information.

2) Calculation of F0 as done above is ok, but it is probably not the best thing to do. Both averaging (=the integral) and taking the minimum are susceptible to outliers. I would rather use a moving window and take np.percentile(window,percentage) or something similar, with 'percentage' something between 5 and 30, depending on the window size. But maybe the implementation you wrote gives you already the best results for your data.

Hope that helps!

alexklibisz commented 7 years ago

Ok thank you for the clarification and tips. It sounds like I should make the filtering step optional (I also noticed it flattens the peaks in a non-helpful way) and experiment with different values for tau2 depending on the data. I am trying a different approach at the moment but will try to come back to this.

I do have a tangential question: why is it important to know the specific number of spikes within a time window of the calcium trace rather than just knowing that there was or wasn't a spike at that time? It seems like the binary problem would be simpler, so I'm curious about the motivation.

Rimsk commented 6 years ago

Hi,

I am really confused how to preprocess the Calcium events. I have one file with 533 frames and 69 ROI stimuli (ST1) onset from frame =162, ST2= 301 and ST3 = 402 . I am looking for for calcium spikes. identify number of neurons which respond to each stimuli. As far as I know

Frame = 3.890s Cell vallue = mean gray value To calculate dF/F, set baseline = mean of first 5 rows

There is nothing for background correction so: 1) I calculate dF/F by taking the mean of first 5 Rows as F0 calculate by formula = x-F0./F0 considering the baseline before stimuli that is frame 1:161. How do I correct the baseline. how to calculate SNR. what should I do next?