fmaschietto / mdigest

GNU General Public License v3.0
29 stars 10 forks source link

MDiGest

MDiGest Public repository.

Best practices made easy for analysis of correlated motions from molecular dynamics simulations.

MDiGest is a comprehensive and user-friendly toolbox designed to facilitate the analysis of molecular dynamics simulations. It contains a wide range of methods ranging from standard to less-standard approaches that allow users to investigate various features extracted from MD trajectories. This includes the correlated dynamics of atomic motions, diherdrals, coupled electrostatic interactions, and more, that can be used to further explore conformational changes of proteins. The tools in the package are organized in a structured way, so that users can easily integrate different metrics into their analysis. Due to the complexity of molecular dynamics analysis, the choice of method can have a major influence on the results. To support this, MDiGest allows users to easily compare multiple approaches, which benefits the user in that it constitutes an all-in one versatile and adaptable platform. Additionally, the package provides a number of visualization tools to further explore the features extracted from the MD trajectories.

Installation

Requirements

Before installing mdigest through pip we recommend creating a clean environment with all required packages as specified by the environment.yml file,

conda env create --name <env> --file environment.yml

or

conda install -c conda-forge mamba

mamba env create --name <env> --file environment.yml

once the environment is created,

conda activate <env>

will activate it.

pip installation

Next, running

pip install mdigest

will install mdigest and all its dependencies in the newly created environment.

To run in a Jupyter Notebook, you will have to add this new environment to the list of kernels:

python -m ipykernel install --user --name=<env>

Getting started

Documentation

Full documentation for the software is available in readthedocs

Hands on minimal example

Load modules

    import mdigest

    from mdigest.core.parsetrajectory import *
    from mdigest.core.correlation import *
    from mdigest.core.dcorrelation import *
    from mdigest.core.networkcanvas import *
    from mdigest.core.auxiliary import *

load a trajectory and topology

    parent = '/path/to/trajectory/'
    topology   = parent + 'a_topology.psf'
    trajectory = parent + 'a_trajectory.dcd' 

parse the trajectory by calling the MDS class in mdigest

    mds = MDS()

    # set number of replicas
    mds.set_num_replicas(1) # use 2 if you have 2 replicas.

    #load topology and trajectory files into MDS class
    mds.load_system(topology, trajectory)

    #align trajectory
    mds.align_traj(inMemory=True, selection='name CA')

    set selections for MDS class
    mds.set_selection('protein and name CA', 'protein')

    #stride trajectory
    mds.stride_trajectory(initial=0, final=-1, step=5)

compute correlation from CA displacements

    dyncorr = DynCorr(mds)
    dyncorr.parse_dynamics(scale=True, normalize=True, LMI='gaussian', MI='None', DCC=True, PCC=True, VERBOSE=True, COV_DISP=True)

compute correlation from dihedrals fluctuations

    dihdyncorr = DynCorr(mds)
    dihdyncorr.parse_dih_dynamics(mean_center=True, LMI='gaussian', MI='knn_5_2', DCC=True, PCC=True, COV_DISP=True)

save for later use

    savedir =  '/save/directory'
    dyncorr.save_class(file_name_root=savedir + 'dyncorr')
    dihdyncorr.save_class(file_name_root=savedir + 'dihdyncorr')

load

    dyncorr_load = sd.MDSdata()
    dyncorr_load.load_from_file(file_name_root=savedir + 'dyncorr')
    dyncorr_load.load_from_file(file_name_root=savedir + 'dihdyncorr')

prepare correlation network for visualization

    dist   = dyncorr_load.distances_allreplicas['rep_0'].copy() 

load different correlation matrices linearized mutual-information based generalized correlation coefficient ()

    viznetdir = '/directory/where/to/save/networks'  
    gcc    = dyncorr_load.gcc_allreplicas['rep_0']['gcc_lmi'].copy()
    dgcc   = dyncorr_load.dih_gcc_allreplicas['rep_0']['gcc_lmi'].copy()
    matrix_dictionary = {'gcc': gcc, 'dgcc':dgcc}

    vizcorr = ProcCorr()
    vizcorr.source_universe(mds.mda_u)
    vizcorr.writePDBforframe(0, viznetdir + 'frame0')
    vizcorr.set_outputparams({'outdir': viznetdir })
    vizcorr.load_matrix_dictionary(matrix_dictionary.copy())
    vizcorr.populate_attributes(matrix_dictionary.copy())
    vizcorr.set_thresholds(prune_upon=np.asarray(dist.copy()), lower_thr=0, upper_thr=5.)
    vizcorr.filter_by_distance(matrixtype='gccT', distmat=True)
    vizcorr.filter_by_distance(matrixtype='dgcc', distmat=True)
    df = vizcorr.df

    to_pickle(df, output= viznetdir + 'network_filter_d_0_5.pkl'.format(0,5))

Open Pymol in the visualize_networks folder

    cd ./mdigest/visualize_networks/

execute pymol locally calling pymol from inside the directory. load a pdb of one frame of the system. It is best to use one frame extracted from the trajectory to ensure consistency with residue numbers.

    from pymol import cmd, util
    import seaborn as sns

    cmd.delete('all')
    viznetdir = '/directory/where/to/save/networks'
    cmd.load(path + 'prot.pdb', '1u2p')
    cmd.color('grey80', 'prot')
    cmd.remove('!(polymer)') 
    cmd.run('draw_network_pymol.py')
    cmd.hide('lines', '*')

visualize short-range correlations from CA displacements on the protein

    draw_network_from_df(viznetdir +'network_filter_d_0_5.pkl', which='gcc', color_by='gcc', sns_palette=sns.color_palette("tab20"), label='gcc', edge_norm=1)``

interactively compare with short-range correlations computed from dihedrals

    draw_network_from_df(viznetdir +'network_filter_d_0_5.pkl', which='dgcc', color_by='dgcc', sns_palette=sns.color_palette("tab20"), label='dgcc', edge_norm=1)

easily inspect different different metrics, such as dynamical cross correlation, mutual-information based correlation... at the desired threshold!

Many more examples are illustrated in the mdigest-tutorial-notebook (in the notebooks/ folder) with four case studies to perform analysis of MD trajectories. Notebooks are best run in google colab. If run locally, add jupyter-kernel to the environment

    conda install -c anaconda ipykernel
    python -m ipykernel install --user --name=<env>

The molecular trajectories required for the notebook are available for download at the following links

Citation

Federica Maschietto, Brandon Allen, Gregory W. Kyro, Victor S. Batista, Journal of Chemical Physics, (2023), in press; MDiGest: A Python Package for Describing Allostery from Molecular Dynamics Simulations. preprint to be updated

A Note to the Users

MDiGest is not the first (nor will be the last) package that allows such analysis, and therefore some of the contents were implemented before in other packages. Some of the packages such as MDAnalysis, NetworkX, etc are imported directly, others are not directly imported but were used to some extent in building MDiGest.

Among these a notable recently released package antecedent is dynetan, graph-oriented python package to compute and anlalyze mutual-information based generalized correlation correlation from MD trajectories. Some of the modules of MDiGest, namely processtrajectory.py and savedata.py are riminescent of the structure of modules performing similar tasks in dynetan. Moreover, as specifically mentioned in the documentation, some accessory functions were adapted from it, the list of which is stated below:

Another notable package is correlationplus, which also focuses on analysis of correlated motions from molecular dynamics simulations. As mentioned in the documentation, the compute_DCC_matrix and compute_DCC functions used to compute dynamical cross-correlation coefficients in MDiGest were adapted from a related function in correlationplus.

dynetan and correlationplus are released under the GPL-v3 and LGPL licenses, hence, MDiGest was released under the GPL-v3 license. In the future, we plan to change such functions, such that we will be able to release the MDiGest under a more permissive license.

Please remember to cite the latter when using these functionalities in MDiGest!

Another package which deserves a mention here is pmdlearn. Although the main capabilities of the latter are very different from what implemented in MDiGest, it provides a comprehensive module for network analysis, some parts of which we adapted in MDiGest.