This repository contains codes, raw data and parameter files that used to reproduce the results in A coarse-grained model for disordered and multi-domain proteins .
We note that the current version of CALVADOS 3 (src/residues_pub.csv), as described in https://www.biorxiv.org/content/10.1101/2024.02.03.578735v3, differs from the version described in our original preprint (https://www.biorxiv.org/content/10.1101/2024.02.03.578735v2). We recommend users to use the current CALVADOS 3 parameters, and to refer to the original version (src/residues_pub_beta.csv) as CALVADOS 3beta.
A demo of CALVADOS 3 Colab is available now. Looking forward to receiving your valuable feedbacks.
data/
: raw data used for making figures in the paper;
paper_multidomainCALVADOS/
: figures (pdf format) in the paper;
src/
: all codes, starting conformations of multi-doamin proteins and experimental data used to reproduce results contained in the paper;
src/BLOCKING/
: tools for assessing convergence of Molecular Dynamics simulations (https://github.com/fpesceKU/BLOCKING);
src/expPREs/
: paramagnetic relaxation enhancement (PRE) NMR data for optimization;
src/extract_relax
: Initial conformations of multi-domain proteins for simulations;
src/domains.yaml
: Domain boundaries of used multi-domain proteins in this study;
src/residues_pub.csv
: parameters file containing CALVADOS 1, CALVADOS 2 and CALVADOS 3;
figures.ipynb
: jupyter version of plotting;
To use this repository:
download and unzip it;
create a conda environment with the following command. (alternative: install packages separately by pip
or conda
; you can skip installing deerpredict
package if you only need simulations not optimizations);
conda env create -f environment.yaml
then activate the environment:
conda activate CALVADOS3
(optional, only for optimization) install pulchra
(https://www.pirx.com/pulchra/) and assign its absolute path to path2pulchra
in src/submit_ray.py
;
👋 Please go for src/simple_singlechain.py
if you are interested in running single-chain simulation with CALVADOS3 in a simpler way 👋.
Follow 1-4 if your protein is a multi-domain protein; only follow 1 and 4 if it is a intrinsically disordered protein;
pro_name
) to avoid conflict with existing proteins;.pdb
, the resSeq
should start from 1
) into /src/extract_relax
directory with a nomenclature as ${pro_name}_rank0_relax.pdb
;Ubq2 has two domains; one is from 11th to 82th residue; the other one is from 87th to 158th residue (counting from 1).
Ubq2:
- [11,82]
- [87,158]
S4FL has two domains; one is from 15th to 138th residue;
the other one has 3 parts because we are excluding long loops between them. 3 parts are still restrained together.
S4FL:
- [15,138]
- [[287,294],[323,466],[492,542]]
rawdata.py
with formats of
fasta_${pro_name} = "${protein_sequence_oneletter}"
proteins.loc['${pro_name}'] = dict(temp=${experimental_temperate}, expRg=${experimental_Rg}, expRgErr=${experimental_RgErr}, pH=${experimental_pH}, fasta=list(fasta_${pro_name}), ionic=${experimental_ionic)}
# if you don't have to compare with experimental data, please set "expRg" and "expRgErr" to None. It won't raise errors for simulations, but will for optimization
The new lines should be in initIDPsRgs
function under if validate:
condition if it is a intrinsically disordered protein;
The new lines should be in initMultiDomainsRgs
function under if validate:
condition if it is a multi-domain protein.
/src/submit_ray.py
as you want and submit jobs by
python3 submit_ray.py
src/extract_relax
as ${pro_name}_${CG}_ini.pdb
. CG
is the coarse-grained method (CA, COM or SCCOM);src/csat_calvados2_test.csv
(if IDP) or src/csat_MDPs_test.csv
(if MDP);/src/submit_slab.py
as you want and submit jobs by
python3 submit_slab.py
Please check the details in src/submit_ray.py
;