Closed davidgwyrick closed 2 months ago
Hi @davidgwyrick , thank you for your interest in the CCG code and for highlighting this important issue. I recently tested the code (with use_parallel=True
) on a spike train with 200 neurons, 120 trials, and 100 bins, and I also noticed that it took significantly longer than expected. It seems there might be some issues with the parallel processing, as the non-parallel version only took about 6 minutes on my machine. I suspect you were using the parallel version—could you try the non-parallel version for now while I work on debugging the parallel computing?
Regarding Xiaoxuan's code, there are a few main reasons why it's faster. First, it only detects one possible connection for each neuron pair, so the total number is half of what we calculate here, which is n(n-1)/2, where n is the number of neurons. Second, it uses the shuffle jitter method in a deterministic way to compute the jittered CCG, avoiding the need to generate multiple samples. In contrast, our approach uses the spike jitter method stochastically, requiring multiple samples to reduce bias. If you're interested, here’s a short reference explaining the slight differences between shuffle jitter and spike jitter methods: Harrison, Matthew, et al. "Jitter Methods for Investigating Spike Train Dependencies." Computational and Systems Neuroscience Abstracts III-17.
I'll keep you posted when the parallel acceleration is optimized for use.
Thanks again for reaching out! Disheng
Thanks for the reply Disheng. Even the non-parallel version is taking a long time. I have more neurons (~400), but the # of trials and bins are similar. I wonder if it has something to do with the data type or if the data is in continuous memory or not.
Do you use the memory parameter for jitter spike trains then?
Hi David, I’ve just improved the efficiency of the CCG calculation. The multiprocessing now operates on the trial dimension, which has significantly reduced the computation time. In my tests on a synthetic spike train with 400 neurons, 120 trials, and 100 time bins, the parallel CCG took about 2 minutes using 14 cores and around 1 minute with 30 cores. The computational time should now be more acceptable. Even the non-parallel version is a bit faster, as it now only loops over neurons with at least one spike in a given trial instead of all neurons.
I’m planning to test GPU acceleration soon, and if it proves to be more efficient than CPU parallel processing, I’ll add it and keep you updated.
One thing to note is the importance of carefully defining the CCG window size. Since your data only has 100 time bins, the window size should not be too close to 100, as the overlap between shifted spike trains would be too small and could introduce bias. In my tests, I used a window size of 40, but you may need to experiment to ensure that the significant CCGs look appropriate.
I set memory=False
in all my tests, as we used the spike jitter method. I included the parameter in the code mainly for those interested in using the pattern jitter method.
Hope this helps!
Wow! Much better. In effect, you are also parallelizing over jitter trials, which I think provides the big speed up. If you could paralellize over pairs of neurons on a GPU, then it would be instantaneous!
Yes, the window size is something I plan to play around with. The data I'm working with is a bit more complicated than the visual coding neuropixel dataset you applied the method on, where interactions are mostly feedforward. We're looking at electrical stimulation where cortex fires first, then thalamus, followed by a period of quiescence, THEN thalamus fires before cortex again. So I'm looking at 2 time windows because of the directionality shift. The first being 100ms long, and the second being 250ms. Do you have any thoughts on this?
Also, can you help me modify the code to return the maximal CCG value (sharp peak or interval) for each pair regardless of significance? It would be useful to know what the value is even if it is not significant. If reporting the max value doesn't make sense, it could be the first peak or interval in the ccg?
That sounds interesting. Computing CCGs separately in two temporal windows could be effective if there’s a significant gap between the two processes. However, if the processes are closer or even interleaved in time, you might want to run some tests to validate the selection of these specific windows. Also, be mindful of how differences in window sizes might impact your findings.
Including non-significant CCG values in the analysis makes sense, and I’ll add that feature soon.
The code has been updated to include full CCGs with maximal confidence levels as a new feature.
Hello @DishengTang , I'm a collaborator of Hannah's. I'm wondering if you know why the calculation of the CCG takes such a long time for relatively modest # of neurons. (~200). The function I'm calling is calculate_mean_ccg_corrected. The data is binned in 1ms bins, 120 trials, of which I'm only looking at the first 100ms and it takes over an hour. Contrast with Xiaoxuan's previous CCG code, it takes ~3 minutes. Happy to troubleshoot with you to optimize the code because I think it's very useful!
Thanks, David Wyrick Scientist I Allen Institute