Protein-Ligand Molecular Dynamics (MD) Simulations using GROMACS

Background on Molecular Dynamics simulations

MD simulations use principles of classical mechanics to predict the motion of particle-based systems. MD simulations can be carried out in software packages such as GROMACS (Lindahl et al., 2001), AMBER (Case et al., 2005), NAMD (Phillips et al., 2005), CHARMM (Brooks et al., 2009), LAMMPS (Plimpton, 1995) and DESMON (Bowers et al.,2006).

GROMACS is a fast, free, popular, and user-friendly package to perform molecular dynamics. It has been continually upgraded and maintained since its initial release in 1991. There are detailed documents and tutorials for assisting in learning to use GROMACS on the website http://www.gromacs.org/. Here, the GROMACS 2023 is used to investigate the dynamics of the SARS-CoV-2 helicase (Nsp13) at the temperature of 300K.

Resources for Molecular Dynamics Simulations:

https://www.ebi.ac.uk/training/materials/biomolecular-simulations-materials/molecular-dynamics-simulations/

https://doi.org/10.1016/bs.mie.2020.04.020

Molecular Dynamics Simulation Basic Steps:

Preparation of protein and ligand structure
Preparation of protein and ligand topologies
Merger of .gro files of protein and ligand
Solvation and Ionisation of the complex
Ligand restrain
Equilibration and production phase

Force Fields in Molecular Dynamics

The GROMACS software only recognizes the atoms defined by the force field, generally including proteins, nucleic acids, and a very finite number of cofactors, like NAD(H) and ATP. If the ligand is not defined by the force field, it won’t be recognized by GROMACS. The ligand can be removed if the ligand binding is not interesting to the researchers. However, if the ligand is required during the simulation, the ligand topology needs to be prepared separately using external tools and then added to the main topology file. It is also advised to remove the external water molecules that are present in the PDB file to de-complicate the topology. However, if the water molecule serves a structural or catalytic role, they should be retained in the PDB file. To remove the unwanted ligands and water molecules, the HETATM lines corresponding to them in the .pdb file can be directly deleted using plain text editors. This can also be achieved visually using tools like PyMOL (see example below), Discovery Studio or Chimera.

What is a force field?

A force field is a special case of energy function describing the dependence of the system energy on the coordinates of its particles. It consists of an analytical form of the interatomic potential energy and a set of parameters entering into this form.

There are several widely applicable fields including CHARMM (MacKerell et al., 1998), AMBER (Cornell et al., 1996), GROMOS (Oostenbrink, Villa, Mark, & van Gunsteren, 2004), OPLS ( Jorgensen, Maxwell, & TiradoRives, 1996) and COMPASS (Sun, 1998). One of the most critical considerations when selecting a force-field for large molecule simulations is whether to employ an all-atom or a united-atom variant.

In all-atom (or explicit hydrogen) force-fields, every atom in the molecule is explicitly treated and parameterized. By the contrary, united-atom force-fields do not explicitly treat non-polar hydrogen atoms, but rather group them with carbon atoms and assign parameters to the resultant CH, CH2, or CH3 “pseudo-atoms.” However, it is difficult to compare the performance of the existing general force fields, as the result will depend strongly on the system and properties simulated. OPLS (Optimized Potentials for Liquid Simulations) was originally developed to simulate small molecules ( Jorgensen, 1981) and an all-atom version (OPLS-AA) was developed later for simulating proteins, with bond stretching and angle bending parameters adopted from AMBER and CHARMM ( Jorgensen et al., 1996).

OPLS-AA is one of the most widely used biomolecular force fields (Guvench & MacKerell, 2008) and could perform better than other force fields when ligands are to be included (Robertson, Tirado-Rives, & Jorgensen, 2016). For simpler folded protein systems, CHARMM force fields including CHARM27 and CHARM22* can provide more reasonably accurate description of the native states of proteins (Lindorff-Larsen et al., 2012).

NOTE - Users can also add new force field files in the GROMACS directory, and then choose to use them in the pdb2gmx step.

Previous Methods Used to Model the SARS-CoV-2 Helicase

5RMM contains three Zinc ion residue clusters; a Zn-Cys4 cluster, a Zn-Cys2-His2 cluster, and a Zn-Cys3-His1 cluster (shown below) which we will treat with the Zinc AMBER force field (ZAFF) parameters. Zinc AMBER force field (ZAFF) was developed by Peters et al. in 2011. It was designed for 4-coordinated zinc metal centers (not suitable for 5- or 6- coordinated zinc centers).For further details see ZAFF

Modelling the active SARS-CoV-2 helicase complex as a basis for structure-based inhibitor design, Berta et al. Chem. Sci., 2021, 12, 13492

To compare simulation results obtained with MD, Bert and colleagues carried out MD simulations using GROMACS 2018 with the Amber14SB10 for the helicase.
To maintain the coordination of the Zn2+ ions, the ZAFF model was used.
The molecular systems were placed in a cubic box and solvated with TIP3P water molecules. The distance between the solute and the box was set to at least 14 Å.
The solute was neutralized with potassium cations and then K+ Cl- ion pairs were added to reach the salt concentration of 0.15 M.
The parameters for Mg2+ are taken from Ref. 14. Long-range electrostatic interactions were treated using the particle mesh Ewald method15,16 with a real-space cut-off of 10 Å. The hydrogen bond lengths were restrained using P-LINCS,5,17 allowing a time step of 2 fs.18
The translational movement of the solute was removed every 1000 steps to avoid any kinetic energy build-up.19 After energy minimization of the solvent and equilibration of the solvated system for 10 ns using a Berendsen thermostat (τT = 1 ps) and Berendsen pressure coupling (τP = 1 ps),18 simulations were carried out in an NTP ensemble at a temperature of 300 K and a pressure of 1 bar using a Bussi velocity-rescaling thermostat20 (τP = 1 ps) and a Parrinello-Rahman barostat (τP = 1 ps).21
During minimization and heating, all the heavy atoms of the solute were kept fixed using positional restraints. The restraints on the RNA and the protein backbone were relaxed slowly during the equilibration from 1000 kJmol-1 nm2 to 10 kJmol-1.nm.

Molecular dynamics simulations of the flexibility and inhibition of SARS-CoV-2 NSP 13 helicase

Methods Examining the conformational flexibility of helicase presents a sampling challenge, as the time scales for the structural changes can be longer than typical simulation times, and a force field challenge, as the models must correctly weigh different, nearly energetically equal conformations [41]. Considerable effort has gone into the development of force fields for proteins [42,43]. The recently developed a99SB-disp model [44] has been shown to accurately reproduce the structure of a range of folded and disordered proteins [44], as well as a protein that contains both folded and disordered domains [45]. The a99SB-disp model is the result of many revisions of the Amber 99SB model [46], which was a revision of the Amber 94 force field [47]. The SB model revised the backbone φ/ψ dihedral terms, which corrected errors in the backbone secondary structure, particularly for glycine [46]. Best and Hummer revised the ψ dihedral terms to give the correct helix/coil populations, creating Amber 99SB [48]. The side chain torsion potentials for four amino acids (isoleucine, leucine, aspartic acid, and asparagine) that showed disagreement with experimental data were reoptimized against quantum calculations, resulting in the a99SB-ILDN model, also referred to as ff99SB-ILDN [49]. To fix the helix propensities of charged residues, new charges from a restrained electrostatic potential (RESP) fit with constraints on the backbone charges to match neutral side chains, were assigned to charges for charged side chains, to give the a99SB-ILDN-Q model [50]. The a99SB-disp model makes a number of changes to the a99SB*-ILDN-Q model, starting with enhanced dispersion interactions, through the Lennard-Jones potential, between the amide hydrogen and carbonyl oxygen backbone atoms. Side chain torsional terms were optimized against ab initio and experimental data. Side chain charges for the charged residues aspartate, glutamate, and arginine were re-parameterized, as was backbone torsional terms for glycine. Lastly, a new water model, a99SB-disp water with slightly stronger Lennard-Jones dispersion was developed to be used with the a99SB-disp model [44].

For the three zinc ions in the ZBD and the complexing cysteine and histidine residues, we used the Zinc Amber Force Field (ZAFF) [51]. The ZAFF force field gave different charges for the backbone atoms for cysteine residues. To make the ZAFF force field compatible with a99SB-disp and have the same charges for the N,H, C, and O backbone atoms, we modified the charges of the Cα and Hα atoms for the cysteine residues complexing a zinc ion, by adding 0.0077e to both atoms, so that charge is conserved. The rest of the side chain charges were from the ZAFF force field. The backbone charges for the histidine residues from ZAFF were the same as a99SB-disp.

Simulations were carried out both for the apo enzyme and with a potential inhibitor in the ATP binding site. This inhibitor was identified through a combination of docking using the FRED and HYBRID programs [52], and molecular dynamics simulations, as described in the Supplementary Information. The Generalized Amber Force Field (GAFF) [53] parameters were used for the potential inhibitor, using AM1-BCC charges [54]. The apo simulation used 36002 and the complex simulation used 35993 water molecules, using the a99SB-disp model, in a rhombic dodecahedron box. Charge neutrality was maintained by adding 9 chloride ions for the apo and 10 chloride ions for the complex, using the a99SB-disp force field for ions.

The simulations were setup using the tleap and antechamber programs of AMBER 16 [55,56]. Those coordinate and topology files were converted to GROMACS input files using the ACPYPE program [57]. Parameter sets for the a99SB-disp model were downloaded from GitHub at https://github.com/paulrobustelli/Force-Fields.

Configurational space sampling is enhanced using the REDS method [39,40], which combines conventional replicas with replicas that have a modified potential energy. The modified, or scaled replicas, can span a large range of energy and are placed between two conventional replicas, so that fewer replicas are required. Replica exchange works by having a replica at a temperature high enough to overcome energetic barriers. Choosing the high temperature is important to the efficiency of replica exchange [[58], [59], [60], [61]]. An optimal high temperature of 366 K was found by Rosta and Hummer for a small protein in water (the l-repressor fragment l6-85) [61] and Heo and Feig used a temperature of 360 K for the prediction of SARS-CoV-2 protein structures (but not including helicase) [62], suggesting that 360 K is a reasonable high temperature. Replica exchange simulations for eight SARS-CoV-2 proteins, but not helicase, used a temperature range of 310 K–350 K, with a number of replicas from 20 to 60 [63]. Our REDS simulations had conventional replicas, using the plain, unscaled replicas, at temperatures of 310, 319, 331, 345, and 360 K, with scaled replicas at 314, 325, 338, and 352 K, for a total of 9 replicas. These temperatures were determined using the method of Patriksson and van der Spoel [64]. RE would require 42 replicas for this system. The simulations used the REDS method as implemented in the GROMACS simulation package [65]. Details of this implementation are given elsewhere [40] and the additions to the GROMACS source code were downloaded from Gitlab at https://gitlab.com/csumma_uno/gromacs.

ff19SB: Amino-Acid-Specific Protein Backbone Parameters Trained against Quantum Mechanics Energy Surfaces in Solution

The MD simulation will be run using poses 1 and 3, corresponding to Vina calculated binding affinities of -8.826 and -8.73 kcal/mol, respectively, and which maintain the key interactions as the fragment (pdb 5rmm), namely ASN516 - carboxylate hydrogen bond.

Preparation of ligand structures

Hydrogens were added to each ligand pose in Biovia Discovery Studio 2021 and each conformer was saved in .mol2 format.

Preparation of the Protein Structure

The same structure that was used for molecular docking with Vina will be used for the GROMACS molecular dynamics simulations. Firstly, the chain structure is incomplete, with residues missing at the C- and N- terminus as well as at two loops in the structure.

First, pull up the structure of the protein chain and display the protein sequence:

`File > Get PDB > '5rmm' and select ' Chain B' > Display > Sequence

Next, clean the structure by selecting and removing the water molecules and ligand 'VGX' from the protein structure. Do not delete the Phosphate or Zinc ions

Identify any gaps in the structure by highlighting missing (grey) residues in the protein sequence and colouring a different colour (I've chosen yellow) to visualise the gaps more easily. Also include a couple of residues on either side of the gap in your selection - this will be important for the loop minimisation process.

On the right side menu where it reads 5rmmB 1/1 select A > Delete selection. Reselect your loop sequence residues and select 'S> Liquorice` to visually the residues.

Go to Builder > Protein and select the N-terminus residue, which in this case is GLY203, then select the missing residues in the builder that correspond to those missing from the sequence. Do this sequentially to add all of the residues that are missing - note after the first residue you will not need to select the N-term residue each time.

Once you've added the residues, connect the N-C termini by selecting the carbonyl Carbon of the C-terminus and Nitrogen of the N-terminus and creating a new single bond. Note this bond will look long and have incorrect geometry, but this will be fixed in the following minimisation step.

Once this is complete for all the missing residues, save the file as a new PDB structure.

Next, go to https://modbase.compbio.ucsf.edu/modloop/ and enter your new 5rmm.pdb model. Specify the loop segments plus the additional residues on either side to focus the loop energy minimisation to that region.

1:B:3:B: 201:B:210:B: 333:B:342:B: 592:B:601:B:

The output file will contain the minimised structure with the phosphate and zinc ions missing. These can be re-added by extracting these ions from the original 5rmm structure and then saving it as a new .pdb file.

Here is the final protein structure, 5rmmB_mod.pdb https://github.com/TomkUCL/SARS-CoV-2-Helicase-nsp13-Public-Antivirals-Virtual-Screening-Project/blob/main/5rmmB_mod.pdb

Next, we need to clean up the protein structure; this involves removing waters, adding any missing atoms, adding Kolman charges, and adding hydrogen atoms. This can all be done in UCSF Chimera from the Actions > DockPrep menu.

Once this is done, we will save this new model as _5rmm_modplusH.pdb , shown in Chimera below.

GROMACS Installation:

Please copy the following into a notepad file and save as a shell (gromacs_install.sh) file to download the GROMACS 2023 installer file. Once you will download the file then please use the following commands to run it.

`#!/bin/sh
pwd
sudo apt-get update
sudo apt-get upgrade
sudo apt-get install cmake
cmake --version
sudo apt-get install build-essential
mkdir gromacs/
cd gromacs/
wget https://ftp.gromacs.org/regressiontests/regressiontests-2023.2.tar.gz
tar xvzf regressiontests-2023.2.tar.gz
sudo apt-get install libfftw3-dev
wget https://ftp.gromacs.org/gromacs/gromacs-2023.2.tar.gz
tar xvzf gromacs-2023.2.tar.gz
cd gromacs-2023.2/
mkdir build
cd build
sudo cmake .. -DGMX_BUILD_OWN_FFTW=OFF -DREGRESSIONTEST_DOWNLOAD=OFF -DCMAKE_C_COMPILER=gcc -DREGRESSIONTEST_PATH=./../../regressiontests-2023.2
make check
sudo make install

source /usr/local/gromacs/bin/GMXRC

echo "source /usr/local/gromacs/bin/GMXRC" >> ~/.bashrc
gmx pdb2gmx --version
;`

Once the file is saved then please move to folder where the file is located. Lets assume that you are using Windows WSL and placed the file in the D drive then please use the following command

cd /mnt/D

Once you will be in the folder where your file is located then please use the following command to install the gromacs

chmod +x gromacs_install.sh
./gromacs_install.sh

GROMACS 2023 will be installed in your system

Basic Modules of GROMACS:

pdb2gmx: This module will prepare the topology file by using the force field.
solvate: This solvate module will carry out the process of solvation i.e. put the protein in a water system.
grompp: This groupp will be used to create one of the very important files during a molecular dynamics simulation; this is the atomic structure input file, which will will have .tpr extension.
genion: This module will be used for the ionization and the neutralization of the system.
mdrun: the module will be used in the energy minimization step, the equilibration, and finally to run the molecular dynamics simulation.

Basic Flags or Arguments in GROMACS:

-f = specify input file name -o = specify output file name -p = process or update the existing file -c = specify input of co-ordinate file

Commands to Run Molecular Dynamics

Simulations of Protein in Water

Step-1: Clean your structure from water grep -v HOH your_structure.pdb > str_clean.pdb

Note: You need to perform this step if your structure is downloaded from protein databank

Step-2: Creation of topology file gmx pdb2gmx -f str_clean.pdb -o str_processed.gro -water spce

Step-3: Solvation gmx editconf -f str_processed.gro -o str_newbox.gro -c -d 1.0 -bt cubic Once the above command is successfully executed then type the following command in the terminal gmx solvate -cp str_newbox.gro -cs spc216.gro -o str_solv.gro -p topol.top

Step-4: Ionization gmx grompp -f ions.mdp -c str_solv.gro -p topol.top -o ions.tpr

Once the above command is successfully executed then type the following command in the terminal gmx genion -s ions.tpr -o str_solv_ions.gro -p topol.top pname NA -nname CL -neutral

Step-5: Energy Minimization gmx grompp -f minim.mdp -c str_solv_ions.gro -p topol.top -o em.tpr Once the above command is successfully executed then type the following command in the terminal gmx mdrun -v -deffnm em

Step-6: Equilibration For temperature equilibration gmx grompp -f nvt.mdp -c em.gro -r em.gro -p topol.top -o nvt.tpr Once the above command is successfully executed then type the following command in terminal gmx mdrun -deffnm nvt -v

If you want to plot the energy then use the following command gmx energy -f nvt.edr -o temperature.xvg For Pressure equilibration

gmx grompp -f npt.mdp -c nvt.gro -r nvt.gro -t nvt.cpt -p topol.top -o 
npt.tpr

Once the above command is successfully executed then type the following command in the terminal gmx mdrun -deffnm npt -v If you want to plot the energy then use the following command gmx energy -f npt.edr -o pressure.xvg

Step-6: Production gmx grompp -f md.mdp -c npt.gro -t npt.cpt -p topol.top -o md_0_1.tpr Once the above command is successfully executed then type the following command in the terminal gmx mdrun -deffnm md_0_1 -v

If you have gpu installed in your system use the following command instead of above-mentioned gmx mdrun -deffnm md_0_1 -v -nb gpu

2.1 Preparation of the system

2.1.1 Obtaining a PDB structure

The basic ingredient to start MD simulations is a protein structure coordinate file in PDB format that can be downloaded from the RCSB website (https://www.rcsb.org/). When the protein structure of interest is not available, a homology model as an initial starting structure may be built by a variety of software tools including Modeller (Sali & Blundell, 1993), I-Tasser (Roy, Kucukural, & Zhang, 2010), Discovery Studio (Dassault Syste`mes BIOVIA, 2016), or the Pepbuild web server (Singh, 2004). The Protein Databank may also contain several structures for the same protein. In this case, all potential structures should be checked for quality and completeness to select the most suitable one for MD simulation.

Once a PDB structure is downloaded, it could be checked and re-formatted using a structure viewing programs such as VMD, Chimera, or PyMOL, and plain text editors such as nano, vi, emacs, Sublime Text2 (Linux/Mac) or Notepad (Windows).

The PDB structures should be checked for whether the structure resolution is high enough (Note 1), whether there are any missing atoms or residues in the structure (Note 2), and whether the ligands or water molecules are present (Note 3). We used the 3-D structure with PDB code 5RMM for the MD simulation of Nsp13 from E. coli (Fig. 1). Nsp13 is a homodimer and each subunit is 70–74 kDa. PDB structure (5RMM) has a resolution of 1.9Ang and no missing atoms/residues are found in this structure. The ligands and crystal waters are removed before starting the MD simulations.

2.1.2 Creating a topology from the PDB file

After getting the input PDB structure ready, we can execute the first GROMACS module, pdb2gmx to convert the PDB file into a GROMACS-specific molecular geometry file format (.gro), the topology file (.top) and position restraint file (.itp). For the simulation of Nsp13, we'll use the OPLS-AA force field, the SPC/E water model, and accepted the default choices for all residue protonation states (Note -GROMACS takes the protonation state of an amino-acid free in solvent at pH 7 as the default. Lys and Arg are protonated, while the Asp and Glu are unprotonated. For His, the proton could be either on ND1, on NE2 or on both and the selections are done automatically based on optimal hydrogen-bonding conformations.), termini, disulfide bridges, etc. The command to use is then:

gmx pdb2gmx -f 5rmm_mod_plusH.pdb -o 5rmm.gro -water spce

pdb2gmx will process the PDB structure and prompt the user to select a force field:

Select the Force Field: From '/usr/local/gromacs/share/gromacs/top': 1: AMBER03 protein, nucleic AMBER94 (Duan et al., J. Comp. Chem. 24, 1999-2012, 2003) 2: AMBER94 force field (Cornell et al., JACS 117, 5179-5197, 1995) 3: AMBER96 protein, nucleic AMBER94 (Kollman et al., Acc. Chem. Res. 29, 461-469, 1996) 4: AMBER99 protein, nucleic AMBER94 (Wang et al., J. Comp. Chem. 21, 1049-1074, 2000) 5: AMBER99SB protein, nucleic AMBER94 (Hornak et al., Proteins 65, 712-725, 2006) 6: AMBER99SB-ILDN protein, nucleic AMBER94 (Lindorff-Larsen et al., Proteins 78, 1950-58, 2010) 7: AMBERGS force field (Garcia & Sanbonmatsu, PNAS 99, 2782-2787, 2002) 8: CHARMM27 all-atom force field (CHARM22 plus CMAP for proteins) 9: GROMOS96 43a1 force field 10: GROMOS96 43a2 force field (improved alkane dihedrals) 11: GROMOS96 45a3 force field (Schuler JCC 2001 22 1205) 12: GROMOS96 53a5 force field (JCC 2004 vol 25 pag 1656) 13: GROMOS96 53a6 force field (JCC 2004 vol 25 pag 1656) 14: GROMOS96 54a7 force field (Eur. Biophys. J. (2011), 40, 843-856, DOI: 10.1007/s00249-011-0700-9) 15: OPLS-AA/L all-atom force field (2001 aminoacid dihedrals)

Since we use the all-atom OPLS-AA force field, type 15 at the command prompt, followed by “Enter.” This command will produce three files: 5RMM.gro, topol.top and posre.itp.

5RMM.gro is a GROMACS-formatted structure file that contains the coordinates of all the atoms defined within the force field. The missing hydrogen atoms are added by the pdb2gmx by default. Note - The .gro file is not mandatory for running simulations with GROMACS. GROMACS can handle many different file formats. If you prefer to use, for instance, the PDB format, all you need to do is to specify an appropriate file name with .pdb extension as your output.

The topol.top file is the system topology that contains the molecular description such as molecular parameters, bonding, force field, and charges.

The posre.itp file contains information used to restrain the positions of heavy atoms, which will be used in the step of position-restraint equilibration.

2.1.3 Creating a simulation box

In a box with rigid walls, the atoms on the protein surface are closer to the box walls and will experience different forces than the other atoms. Therefore, periodic boundary conditions (PBC) are applied to avoid this edge effect on the surface atoms. The existence of PBC means that any atom that leaves a simulation box through one boundary of the cell, will reappear on the opposite side. For this purpose, a box (unit cell) is defined, and that unit cell is surrounded by infinitely replicated, periodic images of itself. The module to define the box dimensions is editconf and the command to generate a box is then:

gmx editconf -f 5RMM.gro -o 5RMM_box.gro -c -d 1.0 -bt cubic

The above command centers the protein in the box (-c), and places it at least 1.0 nm from the box edge (-d 1.0) (Note 8). The box type is defined as a cube (-bt cubic) (Note 9). The new conformation is written to the file 5RMM_box.gro.

2.1.4 Adding solvent water

To mimic the physiological environment for a protein, the box around the protein needs to be solvated. This is performed by using a small pre-equilibrated system of water coordinates that is repeated over the box, with overlapping water molecules removed. To solvate the protein, the solvate module will be used, which adds the required number of water molecules around the protein based on the box type that is specified in the editconf step.

gmx solvate -cp 5RMM_box.gro -cs spc216.gro -o 5RMM_solv.gro -p topol.top

The solvate module solvates a protein configuration in a bath of solvent molecules. The box specified in the protein coordinate file (-cp) is used, which is the output of the previous editconf step, and the configuration of the solvent (-cs) is part of the standard GROMACS installation.

The spc216.gro is a generic equilibrated three-point solvent model that can be used as the solvent configuration for SPC, SPC/E, or TIP3P water models. Water coordinates are taken from an SPC water system and the –p flag adds the new water topology to the topology (.top) file. The generated configuration will be written to 5RMM_solv.gro containing the coordinates of both the protein and the water molecules.

2.1.5 Neutralizing the system

In the step of the pdb2gmx, GROMACS took the protonation state of an amino acid free in solvent at pH 7 as the default. For most of the cases, a charged protein is now in the solvated system. The output of pdb2gmx showed the charge of the protein processed, which was -36 e for 5RMM, and this should be neutralized before simulations (Note 10). For that, counter ions should be added according to the total charge of the system by using the genion module. What genion does is read through the topology and replace water molecules with the ions that the user specifies.

The input file for genion module has an extension of .tpr that is also called a gromacs pre-processor file. This file is produced by the GROMACS grompp module, GROMACS pre-processor. What grompp does is to assemble simulation parameters (.mdp), topology (.top) and coordinates (.gro) into a single run input (.tpr) file. The file with extension .mdp is a molecular dynamics parameter file that contains a series of settings for running control, energy minimization, output control, neighbour searching, electrostatics, VdW, temperature coupling, pressure coupling and so forth. The .mdp file used at this step can contain any legitimate combination of parameters. The em1.mdp used for energy-minimization of section 2.2 will be used here for generating .tpr file, the required input for module genion, as .mdp file for energy minimization is very basic and does not involve any complicated parameter combinations.

em1.mdp

cpp=/usr/bin/cpp define=-DFLEX_SPC constraints=none integrator=steep nsteps=5000 emtol=1000 emstep=0.01 nstcomm=1 ns_type=grid rlist=1 coulomb_type=PME rcoulomb=1.0 rvdw=1.0 Tcoupl=no Pcoupl=no gen_vel=no Once the .mdp file is prepared, then execute the command:

gmx grompp -f em1.mdp -c 1QGD_solv.gro -p topol.top -o ions.tpr

The output is the atomic-level description of the simulation system in the binary file ions.tpr that contains all the parameters for all of the atoms in the system. This file is then passed into the genion module by the command:

gmx genion -s ions.tpr -o 5RMM_solv_ions.gro -p topol.top -pname NA -np 36

When prompted, choose group 13 “SOL” for embedding ions. In the genion command, the .tpr file is provided as the input (-s), a .gro file is generated as output (-o), and the topology (-p) is updated to reflect the removal of water molecules and addition of ions. The positive and negative ion names are defined by flags -pname and -nname, respectively. For the structure of NSP13, sodium ions were added to neutralize the negative charges and the number of positive ions added was 36 defined by the flag -np.

2.2 Energy minimization

During the setup of the simulation system, an unnatural stress might be introduced, for example by placing two atoms accidentally too close to each other. This might result in large forces acting on the particles that would blow up the system in the simulation. Energy minimization (EM) is commonly used to remove these steric clashes or inappropriate geometry and refine low-resolution experimental structures.

GROMACS supports different minimization algorithms and the most commonly used are steepest descent and conjugate gradient.

The steepest descent algorithm is the quickest in removing the largest strains in the system but converges slowly when close to a minimum. The conjugate gradient is slower than the steepest descent in the early stages of the minimization, but becomes more efficient closer to the energy minimum.

It is common to do an initial energy minimization using the efficient steepest descent method and a further minimization with a more sophisticated method such as the conjugate gradient algorithm.

The process for EM is much like the addition of ions. We are once again going to use grompp to assemble the structure, topology, and simulation parameters into a binary input file (.tpr). Instead of passing the .tpr file to genion, we will run the energy minimization through the GROMACS MD engine, mdrun (Note 11). The grompp command is first used to process and prepare the input file necessary for steepest descent energy minimization run.

gmx grompp -f em1.mdp -c 5RMM_solv_ions.gro -p topol.top -o em1.tpr

Once the .tpr file is ready, submit the job for energy minimization by following command:

gmx mdrun -v -deffnm em1

Here, the -v flag makes mdrun create verbose output and print the computation progress to the screen at every step. The -deffnm flag defines a default filename prefix that is used for all input and output files. The corresponding file endings will be appended for the different input and output files, so they do not each need to be specified individually. In this step, the following files will be output.

The file em1.log contains detailed information on several energy terms throughout the simulation and gives a performance analysis at the end. The file em1.gro contains the coordinates of the final configuration of the system. The file em1.trr contains coordinates and forces for each output frame. The em1.edr file contains information on different energy terms and, if applicable for the simulation, information on the temperature, pressure, volume, and some other quantities during the simulation. EM with conjugate gradient method is carried out after the steepest descent method by the following commands: gmx grompp -f em2.mdp -c em1.gro -p topol.top -o em2.tpr gmx mdrun_d -v -deffnm em2

For a minimization using the conjugate gradient method, GROMACS compiled in double precision should be used with the module of mdrun_d.

The .mdp file used at this step is:

em2.mdp

cpp = /usr/bin/cpp define=-DFLEX_SPC constraints=none integrator=cg nsteps=5000 emtol=100 emstep=0.01 nstcgsteep=1000 nstcomm=1 ns_type=grid rlist=1 coulomb_type=PME rcoulomb=1.0 rvdw=1.0 Tcoupl=no. Pcoupl=no gen_vel=no

2.3 Position-restraint equilibration

After energy minimization, a reasonable starting structure is obtained for simulations. To begin real dynamics simulations, the solvent and ions around the protein should also be equilibrated and brought to the temperature at which we wish to simulate. In this step, the positions of protein atoms are restrained and the water molecules are allowed to flow.

Equilibration is often conducted in two steps. The system is first brought to the correct temperature based on kinetic energies, and then the correct pressure is applied to the system until it reaches the proper density. The first step is conducted under an NVT ensemble where the number of particles, volume, and temperature are constant. In NVT, the temperature of the system should reach a plateau at the desired value.

The second step is conducted under an NPT ensemble where the number of particles, pressure, and temperature are constant, which most closely resembles experimental conditions. In this step, the pressure is stabilized and thus also the density of the system.

2.3.1 NVT equilibration

The posre.itp file generated by pdb2gmx is used for position-restraint equilibration, which applies a position restraining force on the heavy atoms of the protein. This allows to equilibrate the solvent around the protein, without the added variable of structural changes in the protein. Typically, 50–100 ps is sufficient (Note 12) for position-restraint equilibration. For the NVT ensemble, the temperature is controlled by V-rescale, the reference temperature is 300K and the pressure coupling is turned off. The detailed settings used are:

nvt.mdp

title=OPLS TK NVT equilibration define=-DPOSRES integrator=md nsteps=25000 dt=0.002 nstxout=100 nstvout=100 nstenergy=100 nstlog=100 continuation=no constraint_algorithm=lincs constraints=all-bonds lincs_iter=1 lincs_order=4 ns_type=grid nstlist=5 rlist=1.0 rcoulomb=1.0 rvdw=1.0 coulombtype=PME pme_order=4 fourierspacing=0.16 tcoupl=V-rescale tc-grps=Protein Non-Protein tau_t=0.1 0.1 ref_t=300 300 pcoupl=no pbc=xyz DispCorr=EnerPres gen_vel=yes gen_temp=300 gen_seed=-1

The grompp and mdrun modules are used just as we did at the EM step:

gmx grompp -f nvt.mdp -c em2.gro -p topol.top -o nvt.tpr
gmx mdrun -deffnm nvt

2.3.2 NPT equilibration

The .mdp file for 50-ps NPT equilibration is not significantly different from the parameter file used for NVT equilibration. The pressure coupling should be added and controlled using the Parrinello-Rahman barostat. Since this is continuing the simulation from the NVT equilibration phase, the continuation=yes should be set. Velocities are also read from the previous trajectory, so the velocity generation should be set off by gen_vel=no. The detailed .mdp file for NPT equilibration is as follows:

npt.mdp

title=TK NPT equilibration define=-DPOSRES integrator=md nsteps=25000 dt=0.002 nstxout=100 nstvout=100 nstenergy=100 nstlog=100 continuation=yes constraint_algorithm=lincs constraints=all-bonds lincs_iter=1 lincs_order=4 ns_type=grid nstlist=5 rlist=1.0 rcoulomb=1.0 rvdw=1.0 coulombtype=PME pme_order=4 fourierspacing=0.16 tcoupl=V-rescale tc-grps=Protein Non-Protein tau_t=0.1 0.1 ref_t=300 300 pcoupl=Parrinello-Rahman pcoupltype=isotropic tau_p=2.0 ref_p=1.0 compressibility=4.5e-5 refcoord_scaling=com pbc=xyz DispCorr=EnerPres gen_vel = no

The modules grompp and mdrun will be called just as we did for NVT equilibration.

gmx grompp -f npt.mdp -c nvt.gro -t nvt.cpt -p topol.top -o npt.tpr gmx mdrun -deffnm npt

2.4 Production simulation

After completion of the two equilibration phases, the system is well-equilibrated at the desired temperature and pressure, and ready for running production simulation for data collection. The parameter .mdp file needs to be adjusted accordingly.

For the production run, the position restraints are turned off. Simulations are also carried out under an NPT ensemble but the simulation time is significantly longer than the NPT equilibration process (Note 13). For the study of Nsp13, we run a 30-ns MD simulation at 300K with the following parameter file.

md.mdp

title=OPLS TK MD integrator=md nsteps=15000000 dt=0.002 nstxout=1000 nstvout=1000 nstxtcout=1000 nstenergy=1000 nstlog=1000 continuation=yes constraint_algorithm=lincs constraints=all-bonds lincs_iter=1 lincs_order=4 ns_type=grid nstlist=5 rlist=1.0 rcoulomb=1.0 rvdw=1.0 coulombtype=PME. pme_order=4 fourierspacing=0.16 tcoupl=V-rescale tc-grps=Protein Non-Protein tau_t=0.1 0.1 ref_t=300 300 pcoupl=Parrinello-Rahman pcoupltype=isotropic tau_p=2.0 ref_p=1.0 compressibility=4.5e-5 pbc=xyz DispCorr=EnerPres gen_vel=no

The modules grompp and mdrun will be applied just as we did before (note14).

gmx grompp –f md.mdp -c npt.gro -t npt.cpt -p topol.top -o md.tpr gmx mdrun -deffnm md

In general, the final mdrun is performed on a computer cluster or supercomputer (Note 14). GROMACS can run in parallel on multiple cores of a single workstation using its built-in thread-MPI. Assuming a standard MPI installation with mpirun tool, launch a simulation with 32 processes using the command:

mpirun -np 32 gmx mdrun_mpi -deffnm md

Considering the huge computer resources needed in the production simulation, the final mdrun is generally performed on a computer cluster or super computer. The remote servers can be connected via ssh clients such as putty from windows and the file transfers can be performed by using tools such as FileZilla and WinSCP tool.

TomkUCL / SARS-CoV-2-Helicase-nsp13-Public-Antivirals-Virtual-Screening-Project