GeomScale / dingo

A python library for metabolic networks sampling and analysis
GNU Lesser General Public License v3.0
44 stars 28 forks source link

Functions for correlation matrix, copula indicator and plotting #103

Closed SotirisTouliopoulos closed 2 months ago

SotirisTouliopoulos commented 3 months ago

Aim of this PR is:

  1. to provide functions for calculation of correlation matrix from steady states, filtering based on copula indicator & plotting
  2. to improve the efficiency of the PreProcess class.

The correlated_reactions function that calculates a pearson correlation matrix from reactions steady states is appended to the script dingo/utils.py Correlations that do not make the pearson cutoff are replaced with 0. For pairwise reactions with a greater pearson coefficient than the cutoff, a copula indicator is computed to filter false-positive correlations. Parameters of this function can adjust the width of the copula's diagonal for the calculation of the indicator and return only the lower triangle of the symmetric matrix. Cutoffs for pearson and indicator filtering can be adjusted from the user too.

The plot_corr_matrix function that creates a heatmap plot of a correlation matrix is appended to the script dingo/illustrations.py Parameters of this function can specify the format of the saved image plot and place as labels only the remained reactions for reduced models.

The reduce function of the PreProcess class for the extend parameter set to True has also changed to identify additional reactions for removal in bigger models. It uses the new correlated_reactions function to calculate a correlation matrix and order reactions based on their sum of absolute correlations with other reactions. Reactions with smaller values of overall correlation are removed first. The removal stops in the first reaction that alters the value of the objective function.

A correlation.py unittest is also created in the tests directory and the preprocess.py unittest is updated too.