HelmchenLabSoftware / Cascade

Calibrated inference of spiking from calcium ΔF/F data using deep networks
GNU General Public License v3.0
123 stars 34 forks source link

Experimental settings of "Comparison with existing methods"? #32

Closed tpzou closed 2 years ago

tpzou commented 2 years ago

Hi, I am new to this field, so the questions I asked maybe a bit silly.

I don't seem to find some experimental settings in "Comparison with existing methods", such as the resampling rate of each dataset. Would you mind providing some more specific experimental settings so that I can reproduce the results?

I am also confused about some of the figure icons, such as the '50%', '6HZ' and '10s' in fig. 4a. What do they mean?

Thanks, in advance.

PTRRupprecht commented 2 years ago

Hi tpzou,

Thank you for your interest in our method!

I assume that your question is about the paper directly, not about the implementation of the method on this Github repository.

To your questions:

Let me know if you have any further questions! Also, I can try to give you better advise if I know more about your intended use of the dataset.

Best, Peter

tpzou commented 2 years ago

Thanks for your quick reply! But I still have some questions.

Is fig4a a subset of "09 GCaMP6f"? Since 6 Hz = 6 spikes/s for the spike rate estimates, why the spike rate of "09 GCaMP6f" in table 1 is "0.6 ± 0.2Hz"?

PTRRupprecht commented 2 years ago

I do not remember whether fig4a is a subset of "09 GCaMP6f", although the calcium trace looks similar. If this is important for your project, I can look it up.

However, I think there is a misunderstanding here. "6 Hz = 6 spikes/s" is for the full scale bar. The actual spike rate (in orange, partially hidden behind the estimated spike rate) is most often zero and only becomes higher sometimes. That's why, on average, the spike rate for the datasets in table 1 are rather low (e.g., 0.6 Hz on average). The low spike rate is an average spike rate across all time points. The scale bar in fig4a is as large as some of the largest spike rate deflections but not similar to the average spike rate.

Let me know if this is not clear yet.

tpzou commented 2 years ago

Thank you for your patience, I get it.

I am trying to remove the add-noise and resample operations to make the network more robust. My major is more in deep learning than biology. So I probably care more about the reproduction of the results to facilitate my comparison. I find the content between H_Cascade and P_Cascade are different, such as Ground_truth. What is the difference between these two?

I also try to reproduce the results of "Global EXC" in fig4 using H_Cascade . I adjust the code in “Demo_benchmark_model.py” with cfg['sampling_rate'] = 7.5 and use the Ground_truth from P_Cascade.

all_training_datasets = ['DS01-OGB1-m-V1',
 'DS02-OGB1-2-m-V1',
 'DS03-Cal520-m-S1',
 'DS04-OGB1-zf-pDp',
 'DS05-Cal520-zf-pDp',
 'DS06-GCaMP6f-zf-aDp',
 'DS07-GCaMP6f-zf-dD',
 'DS08-GCaMP6f-zf-OB',
 'DS09-GCaMP6f-m-V1',
 'DS10-GCaMP6f-m-V1-neuropil-corrected',
 'DS11-GCaMP6f-m-V1-neuropil-corrected',
 'DS12-GCaMP6s-m-V1-neuropil-corrected',
 'DS13-GCaMP6s-m-V1-neuropil-corrected',
 'DS14-GCaMP6s-m-V1',
 'DS15-GCaMP6s-m-V1',
 'DS16-GCaMP6s-m-V1',
 'DS17-GCaMP5k-m-V1',
 'DS18-R-CaMP-m-CA3',
 'DS19-R-CaMP-m-S1',
 'DS20-jRCaMP1a-m-V1',
 'DS21-jGECO1a-m-V1',
 'DS22-OGB1-m-SST-V1',
 'DS23-OGB1-m-PV-V1',
 'DS24-GCaMP6f-m-PV-V1',
 'DS25-GCaMP6f-m-SST-V1',
 'DS26-GCaMP6f-m-VIP-V1',
 'DS27-GCaMP6f-m-PV-vivo-V1',
 'DS28-XCaMPgf-m-V1',
 'DS29-GCaMP7f-m-V1',
 'DS30-GCaMP8f-m-V1',
 'DS31-GCaMP8m-m-V1',
 'DS32-GCaMP8s-m-V1']

But the results were poor. Is there something I might have done wrong?

Demo_benchmark_model.txt

PTRRupprecht commented 2 years ago

Ok, now I understand better what you want to achieve, although I'm not entirely sure how removal of the resampling and addition of noise could make the algorithm more robust.

But to your question: H_Cascade is the main repository, P_Cascade is kind of a development branch where I test a couple of things. For example, I have added new ground truth (for a new sensor, GCaMP8), which became available only after the publication of our study. In addition, there are also a couple of additional analysis scripts.

The Demo_benchmark_model.py does not fully reproduce the results in Fig. 4a, since the models are tested not on single other datasets but on all datasets and then averaged. In particular, this averaging includes the new ground truth datasets DS28-32, which do not yet work optimally, and the averaging also includes the interneuron datasets DS22-27, which were not included in the quantification in Fig. 4b. Maybe this explains the discrepancy between your results and what you expected.

To better reproduce the results from Fig. 4a, you might use the following script: https://github.com/PTRRupprecht/Cascade/blob/master/Demo%20scripts/Benchmark_GC8.py In this script, a model for each dataset is trained and then applied to each other dataset separately.

It is again not a 1:1 reproduction of the results of Fig4a (since I readjusted some parts of the training procedure, Keras versions and the ground truth itself since then, and since every run of training a model can be more or less successful), but it should come closer. The results are also saved to disk, so you can browse them.

Let me know if this addresses your questions or if some unclarity remains!

tpzou commented 2 years ago

I have misrepresented here. Here "more robust" means making the network more general, i.e., no need to retrain the model at different sample rates nor to formulate noise level.

The script looks like "1. Train model on one dataset, test on all other datasets". But I want to "3. Train model on all datasets except one, test on remaining one", where the Demo_benchmark_model.py seems more likely than the script.

I'm not sure if my understanding is wrong?

PTRRupprecht commented 2 years ago

I see. You were right about Demo_benchmark_model.py, it is actually the script you want to use for your purpose.

However, in the script that you attached, you were training a global model based on all datasets (including the new GCaMP8 datasets and the interneuron datasets. These are substantially different from the other datasets. Including them in the training or testing dataset will substantially deteriorate your results. I would suggest to instead include the following list of datasets only:

all_training_datasets = ['DS02-OGB1-2-m-V1', 'DS04-OGB1-zf-pDp', 'DS05-Cal520-zf-pDp', 'DS06-GCaMP6f-zf-aDp', 'DS07-GCaMP6f-zf-dD', 'DS08-GCaMP6f-zf-OB', 'DS09-GCaMP6f-m-V1', 'DS10-GCaMP6f-m-V1-neuropil-corrected', 'DS11-GCaMP6f-m-V1-neuropil-corrected', 'DS12-GCaMP6s-m-V1-neuropil-corrected', 'DS13-GCaMP6s-m-V1-neuropil-corrected', 'DS14-GCaMP6s-m-V1', 'DS15-GCaMP6s-m-V1', 'DS16-GCaMP6s-m-V1', 'DS17-GCaMP5k-m-V1', 'DS18-R-CaMP-m-CA3', 'DS19-R-CaMP-m-S1', 'DS20-jRCaMP1a-m-V1', 'DS21-jGECO1a-m-V1']

DS01 is excluded because it does not yield good results (cf. Fig. 4a), and DS03 is excluded because the data fragments are too short and cannot be processed with this standard model.

Does this help you?

tpzou commented 2 years ago

Thanks for your help! I will try it soon.

tpzou commented 2 years ago

There still seems to be some problems in my code. When I use the cascade.train_model( ), the network doesn't seem to be trained. I have no idea about why these happen.

Fitting model 2 with noise level 2 (total 2 out of 35).
Epoch 1/10
4883/4883 [==============================] - 13s 2ms/step - loss: 0.0114
Epoch 2/10
4883/4883 [==============================] - 12s 3ms/step - loss: 0.0115
Epoch 3/10
4883/4883 [==============================] - 12s 2ms/step - loss: 0.0115
Epoch 4/10
4883/4883 [==============================] - 12s 2ms/step - loss: 0.0115
Epoch 5/10
4883/4883 [==============================] - 12s 2ms/step - loss: 0.0115
Epoch 6/10
4883/4883 [==============================] - 12s 2ms/step - loss: 0.0115
Epoch 7/10
4883/4883 [==============================] - 12s 3ms/step - loss: 0.0114
Epoch 8/10
4883/4883 [==============================] - 12s 2ms/step - loss: 0.0114
Epoch 9/10
4883/4883 [==============================] - 12s 3ms/step - loss: 0.0114
Epoch 10/10
4883/4883 [==============================] - 12s 3ms/step - loss: 0.0114
Saved model: Model_NoiseLevel_2_Ensemble_1.h5

Fitting model 3 with noise level 2 (total 3 out of 35).
Epoch 1/10
4883/4883 [==============================] - 13s 3ms/step - loss: 0.0114
Epoch 2/10
4883/4883 [==============================] - 13s 3ms/step - loss: 0.0114
Epoch 3/10
4883/4883 [==============================] - 12s 2ms/step - loss: 0.0114
Epoch 4/10
4883/4883 [==============================] - 12s 3ms/step - loss: 0.0114
Epoch 5/10
4883/4883 [==============================] - 12s 3ms/step - loss: 0.0114
Epoch 6/10
4883/4883 [==============================] - 12s 2ms/step - loss: 0.0114
Epoch 7/10
4883/4883 [==============================] - 12s 3ms/step - loss: 0.0114
Epoch 8/10
4883/4883 [==============================] - 12s 2ms/step - loss: 0.0114
Epoch 9/10
4883/4883 [==============================] - 13s 3ms/step - loss: 0.0114
Epoch 10/10
4883/4883 [==============================] - 12s 3ms/step - loss: 0.0114
Saved model: Model_NoiseLevel_2_Ensemble_2.h5

These are the output of Demo_train.ipynb. It looks untrained. The result looks like this too.

2022-05-12 17:04:42,110 [INFO] - test name: Ground_truth/DS16-GCaMP6s-m-V1
2022-05-12 17:04:42,111 [INFO] - m_correlation:-0.074
2022-05-12 17:04:42,112 [INFO] - m_error:0.972
2022-05-12 17:04:42,112 [INFO] - m_bias:-0.956

What could be the cause of this?

PTRRupprecht commented 2 years ago

First thing I notice is that your training is really fast! Either you are using a high-end GPU, or something is wrong (e.g., no existing gradient in the data).

For the Demo_train.ipynb, I tested the code in the Notebook in a Colab environment, and it seems to work. The loss decreases (to something like 0.003 after one training epoch). Here's a copy of the Colab notebook, maybe you can compare it to your local jupyter notebook (I trained for a few epoch before aborting): https://github.com/PTRRupprecht/Cascade/blob/master/Demo%20scripts/Train_a_test_model_with_Cascade_on_Colab_test.ipynb

If you don't find any issues, maybe it would be worth thinking about the python packages versions and whether there are any differences. Have you used the recommended installation via conda environment?

Possible sources of the error could be (1) keras or something associated with the GPU training itself, or (2) the generation of the resampled ground truth. If you have a suspicion about the latter, I can tell you how to change the code such that this can be checked during training.

tpzou commented 2 years ago

Thank you for the reminder of GPU! Before that I was always using the GPU0(3080). Now I am using 2080Ti. Everything looks great now! Although I don't known why it happened.

Thank you again for your patient answer!

PTRRupprecht commented 2 years ago

Thanks for figuring out this issue! It is also useful to know that training did not work with the 3080 GPU. I don't have any 3080 at my disposal, but I will keep this in mind for future testing.

tpzou commented 2 years ago

Hi, I still have some questions during my training.

I use Demo_benchmark_model.py, and it is set up as follows

cfg['sampling_rate'] = 7.5
noise_level = 2
 all_training_datasets = ['DS02-OGB1-2-m-V1', 'DS04-OGB1-zf-pDp', 'DS05-Cal520-zf-pDp', 'DS06-GCaMP6f-zf-aDp', 'DS07-GCaMP6f-zf-dD', 'DS08-GCaMP6f-zf-OB', 'DS09-GCaMP6f-m-V1', 'DS10-GCaMP6f-m-V1-neuropil-corrected', 'DS11-GCaMP6f-m-V1-neuropil-corrected', 'DS12-GCaMP6s-m-V1-neuropil-corrected', 'DS13-GCaMP6s-m-V1-neuropil-corrected', 'DS14-GCaMP6s-m-V1', 'DS15-GCaMP6s-m-V1', 'DS16-GCaMP6s-m-V1', 'DS17-GCaMP5k-m-V1', 'DS18-R-CaMP-m-CA3', 'DS19-R-CaMP-m-S1', 'DS20-jRCaMP1a-m-V1', 'DS21-jGECO1a-m-V1']

When predicting DS15-GCaMP6s-m-V1, the cascade.predict() calculates the noise level as [2.52555014], which is bigger than 2.5 (neuron_idx = np.where(trace_noise_levels < model_noise + 0.5)[0]). This leads to a prediction of

2022-05-13 20:47:10,443 [INFO] - test name: Ground_truth/DS15-GCaMP6s-m-V1
2022-05-13 20:47:10,443 [INFO] - m_correlation:nan
2022-05-13 20:47:10,443 [INFO] - m_error:1.000
2022-05-13 20:47:10,443 [INFO] - m_bias:-1.000

The same things also happen to DS12-GCaMP6s-m-V1-neuropil-corrected, etc, but don't happen to DS16-GCaMP6s-m-V1. I check table 1 in the paper but find that the noise of 16 (0.9 ± 0.2) is higher than 15 (0.7 ± 0.2). When set sampling rate as 30 Hz, the above problems will disappear ( DS15-GCaMP6s-m-V1 , etc will be available).

What could be the cause of this?

PTRRupprecht commented 2 years ago

Thanks for noticing this problem! I had not noticed this problem earlier since I it does not affect people who used pre-trained models only.

I just fixed the code in cascade2p.cascade.predict(). Now, for each neuron, the code looks for the model with the closest noise level (no matter whether this is more than 0.5 or more units away). I did a few tests to check that it works, and it seems to show the same behavior as before, but it also covers the case that caused your problem.

Thanks a lot for reporting the bug, and for correctly pointing out the location of the problem in the code already!