OHBA-analysis / osl-dynamics

Methods for studying dynamic functional brain activity in neuroimaging data.
https://osl-dynamics.readthedocs.io/
MIT License
58 stars 18 forks source link

Is it normal for the loss to decrease so slowly? #279

Closed dgai91 closed 14 hours ago

dgai91 commented 1 month ago

Hi, Dr Cgohil: I'm sorry to bother you again. These days, I have been trying to train dynemo model on HCP data. I have got 2000 fmri timeseries which reconstructed by ICA(30ic, emb=15 pca=60 for tde) components like other paper provided methods. And I tried config loss_calc="mean"

config = Config(
    n_modes=8,
    n_channels=60,
    sequence_length=100,
    inference_n_units=32,
    inference_normalization='batch',
    inference_n_layers=2,
    model_n_units=64,
    model_normalization='batch',
    model_n_layers=3,
    learn_alpha_temperature=True,
    initial_alpha_temperature=1.0,
    learn_means=False,
    learn_covariances=True,
    do_kl_annealing=True,
    kl_annealing_curve="tanh",
    kl_annealing_sharpness=5,
    n_kl_annealing_epochs=10,
    batch_size=32,
    learning_rate=0.01,
    n_epochs=120,
    loss_calc='mean'
)

and this is my loss

189/189 [==============================] - 79s 388ms/step - loss: 107.4109 - ll_loss: 106.7164 - kl_loss: 0.6945 - lr: 0.0050
Epoch 18/120
189/189 [==============================] - 82s 408ms/step - loss: 107.1780 - ll_loss: 106.4712 - kl_loss: 0.7068 - lr: 0.0045
Epoch 19/120
189/189 [==============================] - 77s 381ms/step - loss: 107.0375 - ll_loss: 106.3399 - kl_loss: 0.6976 - lr: 0.0041
Epoch 20/120
189/189 [==============================] - 82s 405ms/step - loss: 106.9314 - ll_loss: 106.2403 - kl_loss: 0.6911 - lr: 0.0037
Epoch 21/120
189/189 [==============================] - 81s 403ms/step - loss: 106.8579 - ll_loss: 106.1676 - kl_loss: 0.6903 - lr: 0.0033
Epoch 22/120
189/189 [==============================] - 79s 390ms/step - loss: 106.8114 - ll_loss: 106.1268 - kl_loss: 0.6844 - lr: 0.0030
Epoch 23/120
189/189 [==============================] - 83s 413ms/step - loss: 106.7334 - ll_loss: 106.0547 - kl_loss: 0.6787 - lr: 0.0027
Epoch 24/120
189/189 [==============================] - 81s 401ms/step - loss: 106.6758 - ll_loss: 106.0051 - kl_loss: 0.6708 - lr: 0.0025
Epoch 25/120
189/189 [==============================] - 81s 401ms/step - loss: 106.6227 - ll_loss: 105.9515 - kl_loss: 0.6712 - lr: 0.0022
Epoch 26/120
189/189 [==============================] - 83s 418ms/step - loss: 106.5905 - ll_loss: 105.9190 - kl_loss: 0.6714 - lr: 0.0020
Epoch 27/120
189/189 [==============================] - 81s 400ms/step - loss: 106.5761 - ll_loss: 105.9058 - kl_loss: 0.6704 - lr: 0.0018
Epoch 28/120
189/189 [==============================] - 78s 386ms/step - loss: 106.5239 - ll_loss: 105.8619 - kl_loss: 0.6620 - lr: 0.0017
Epoch 29/120
189/189 [==============================] - 82s 408ms/step - loss: 106.4844 - ll_loss: 105.8213 - kl_loss: 0.6632 - lr: 0.0015
Epoch 30/120
189/189 [==============================] - 79s 399ms/step - loss: 106.4764 - ll_loss: 105.8134 - kl_loss: 0.6630 - lr: 0.0014
Epoch 31/120
189/189 [==============================] - 80s 397ms/step - loss: 106.4219 - ll_loss: 105.7583 - kl_loss: 0.6636 - lr: 0.0012
Epoch 32/120
189/189 [==============================] - 82s 408ms/step - loss: 106.3966 - ll_loss: 105.7341 - kl_loss: 0.6625 - lr: 0.0011

Finally, the value of the loss function can be reduced to 104(epochs=120), but since I have no experience training VAE, I am not sure if this loss is normal, so I would appreciate some guidance from you.

cgohil8 commented 1 month ago

Chances are you've just converged very quickly due to the low number of KL annealing epochs. Normally, we advise n_kl_annealing_epochs = n_epochs // 2.

Note, you have a good number of batches, so probably don't need to train for that many epochs to converge. I would try something like n_epochs=50, n_kl_annealing_epochs=25 and check the loss curve. You should see it increase for the first half (due to the KL annealing) then start decreasing in the second. It should hopefully converge to a value before the 50th epoch.

dgai91 commented 1 month ago

Thank you Dr cgohil!. I have been trying the new param which you suggested.

In addition, I have some questions about the use of data, if you want to draw the FCN for each state, there is a problem here, that is, the absolute strength of each side of the FCN may be large, not the range of -1 to 1 like static analysis. For this problem, how do you think it is appropriate to deal with, so as to ensure that it is more acceptable in neuroscience.

Finally, regarding normalization of the mixing coefficients, should I understand that when statistical analysis of the mixing coefficients is performed, the normalized mixing coefficients should be used, rather than the original mixing coefficients? In this case, the following question arises: *_when calculating FC at a certain time, should the FC be normalized using the method mentioned in the previous question and then weighted using the normalized alpha, or should the original FC alpha be used directly?_**

cgohil8 commented 1 month ago

In addition, I have some questions about the use of data, if you want to draw the FCN for each state, there is a problem here, that is, the absolute strength of each side of the FCN may be large, not the range of -1 to 1 like static analysis. For this problem, how do you think it is appropriate to deal with, so as to ensure that it is more acceptable in neuroscience.

This is likely due to the network being a covariance matrix rather than correlation, you could use the following function to normalise it into a correlation matrix:

https://github.com/OHBA-analysis/osl-dynamics/blob/main/osl_dynamics/array_ops.py#L64

Finally, regarding normalization of the mixing coefficients, should I understand that when statistical analysis of the mixing coefficients is performed, the normalized mixing coefficients should be used, rather than the original mixing coefficients? In this case, the following question arises: when calculating FC at a certain time, should the FC be normalized using the method mentioned in the previous question and then weighted using the normalized alpha, or should the original FC * alpha be used directly?

Whether to use the renormalised or raw alphas depends on what you're trying to do. If you want to reconstruct a network I think the raw alphas would be more appropriate.

dgai91 commented 1 month ago

Thank you Dr Cgohil ! your explaination is very clearly. Recently, I encountered another problem when using Dynemo, which is that I cannot reproduce my previous results. Is this because of my random seed setting? However, after researching related materials, I found that for TensorFlow, setting the following random seed can fix the results:

def set_seeds(seed):
    tf.random.set_seed(seed)
    np.random.seed(seed)
    os.environ['PYTHONHASHSEED'] = str(seed)
    random.seed(seed)

Although I fixed the random seed through the above functions, when I train the model using the following code, the fluctuation of max(mean(alpha)) is extremely large, with the range of the largest alpha ranging from 0.16 to 0.9. When the alpha of a certain state is 0.9, the model is equivalent to being static.

set_seeds(800)
config = Config(
...
)

print("Reading resting-fMRI data")

dataset_dir = "../data/hcp_fmri/ica30/"

training_data = data.Data(dataset_dir, load_memmaps=False)
methods = {"standardize": {}, "tde_pca": {"n_embeddings": 5, "n_pca_components": 30}}
training_data.prepare(methods)

training_dataset = training_data.dataset(
    config.sequence_length,
    config.batch_size,
    shuffle=True,
    concatenate=True,
)

prediction_files = dataset_dir
prediction_data = data.Data(
    prediction_files,
    load_memmaps=False
)
prediction_data.prepare(methods)
prediction_dataset = prediction_data.dataset(
    config.sequence_length, config.batch_size, shuffle=False, concatenate=False
)

model = Model(config)
# init_history = model.random_subset_initialization(training_dataset, n_epochs=1, n_init=5, take=0.5)
history = model.fit(training_dataset)
model.save(f"{output_dir}/model")

model = load(f"{output_dir}/model")
a = model.get_alpha(prediction_dataset, concatenate=False, remove_edge_effects=True)
mean_a = np.mean(np.concatenate(a), axis=0)
print("mean_a:", mean_a)

I understand that your research focuses on using EEG data, but I think that under similar loss conditions, the fluctuation in the calculation of alpha should not be so large. I would appreciate it if you could give me some advice. Thank you very much, and once again, I apologize for disturbing you.I will post my complete code below.

dgai91 commented 1 month ago

import os
import numpy as np

from osl_dynamics import data, inference
from osl_dynamics.inference import tf_ops
from osl_dynamics.models import load
from osl_dynamics.utils import plotting
from osl_dynamics.models.dynemo import Config, Model
import pickle
from osl_dynamics.analysis import modes, power, workbench, connectivity
import tensorflow as tf
import random

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'

output_dir = "../dynemo_results"
os.makedirs(output_dir, exist_ok=True)

def set_seeds(seed):
    tf.random.set_seed(seed)
    np.random.seed(seed)
    os.environ['PYTHONHASHSEED'] = str(seed)
    random.seed(seed)

set_seeds(800)
config = Config(
    n_modes=12,
    n_channels=30,
    sequence_length=20,
    inference_n_units=32,
    inference_normalization='batch',
    inference_n_layers=2,
    model_n_units=8,
    model_normalization='batch',
    model_n_layers=1,
    learn_alpha_temperature=True,
    initial_alpha_temperature=1.0,
    learn_means=False,
    learn_covariances=True,
    do_kl_annealing=True,
    kl_annealing_curve="tanh",
    kl_annealing_sharpness=5,
    n_kl_annealing_epochs=25,
    batch_size=64,
    learning_rate=0.01,
    n_epochs=50,
    loss_calc='mean'
)

print("Reading resting-fMRI data")

dataset_dir = "../data/hcp_fmri/ica30/"

training_data = data.Data(dataset_dir, load_memmaps=False)
methods = {"standardize": {}, "tde_pca": {"n_embeddings": 5, "n_pca_components": 30}}
# methods = {"standardize": {}, "pca": {"n_pca_components": 20}}
training_data.prepare(methods)

training_dataset = training_data.dataset(
    config.sequence_length,
    config.batch_size,
    shuffle=True,
    concatenate=True,
)

prediction_files = dataset_dir
prediction_data = data.Data(
    prediction_files,
    load_memmaps=False
)
prediction_data.prepare(methods)
prediction_dataset = prediction_data.dataset(
    config.sequence_length, config.batch_size, shuffle=False, concatenate=False
)

model = Model(config)
# init_history = model.random_subset_initialization(training_dataset, n_epochs=1, n_init=5, take=0.5)
history = model.fit(training_dataset)
model.save(f"{output_dir}/model")

model = load(f"{output_dir}/model")
a = model.get_alpha(prediction_dataset, concatenate=False, remove_edge_effects=True)
pickle.dump(a, open(f"{output_dir}/alp.pkl", "wb"))
means, covariances = model.get_means_covariances()
print(means.shape, covariances.shape)
np.save(f"{output_dir}/means.npy", means)
np.save(f"{output_dir}/covs.npy", covariances)
# means = np.load(f"{output_dir}/means.npy")
# covariances = np.load(f"{output_dir}/covs.npy")
# a = pickle.load(open(f"{output_dir}/alp.pkl", "rb"))

mean_a = np.mean(np.concatenate(a), axis=0)
print("mean_a:", mean_a)

plotting.plot_alpha(a[0], filename=f"{output_dir}/alpha.png")

tr_covariances = np.trace(covariances, axis1=1, axis2=2)

a_NW = [tr_covariances[np.newaxis, ...] * alp for alp in a]
a_NW = [alp_NW / np.sum(alp_NW, axis=1)[..., np.newaxis] for alp_NW in a_NW]
plotting.plot_alpha(a_NW[0], filename=f"{output_dir}/alpha_nw.png")

power_map = modes.raw_covariances(
    covariances,
    pca_components=training_data.pca_components,
    n_embeddings=training_data.n_embeddings,
)

# Save the power maps

connectivity.save(
    power_map,
    parcellation_file="../../roi_labels/ica_components/canica_components_30ic.nii.gz",
    plot_kwargs={
        "edge_cmap": "Reds",
        "display_mode": "xz",
        "annotate": False
    },
    filename=f"{output_dir}/connectivity.png",
    threshold=0.95
)
print(max(mean_a), 800)