PROQ SCRIPTS Contact: bjornw@ifm.liu.se git: https://github.com/bjornwallner/ProQ_scripts.git
ProQ is Model Quality Assessment Program that predictis the quality of individual models. To do this ProQ uses both information that can be calculated from the 3D coordinates of the model as well as information that can be predicted from the sequence. Of course the information that comes from the sequence is the same for all models of the same sequence. Thus, before scoring models the sequence specific features needs to be calculated. The ProQ_scripts folder contains the scripts and programs needed to generate the sequence specific input features to the ProQ2 and ProQM scoring functions in Rosetta.
INSTALLATION of ProQ2/ProQM scoring functions
1) git clone https://github.com/bjornwallner/ProQ_scripts 2) Download the latest weekly release of Rosetta from https://www.rosettacommons.org/software/, you need a license but it's free for academia. 3) compile rosetta according to the instructions.
INSTALLATION of ProQ_scripts to calculate sequence specific features
To install the programs to calculate all sequence specific features you need to do the following steps
1) Set the $DB variable in bin/run_all_external.pl to a formated sequence database e.g. uniref90. (e.g. download the fasta sequences and run formatdb from the blast package) 2) run ./configure.pl
TEST To check that everything is set up run ./run_test.sh This should create all the files needed without any error messages.
RUNNING Once set up, the main wrapper program 'bin/run_all_external.pl' will run everything using either a fasta file or extracting the sequence from a pdb
bash$ bin/run_all_external.pl Usage: run_all_external.pl -pdb [pdbfile] -fasta [fastafile] -membrane 1 (if membrane protein) -overwrite 1 (if overwrite)
For water soluable proteins run:
bin/run_all_external.pl -pdb
For membrane proteins run:
bin/run_all_external.pl -pdb
To use the sequence specific files in rosetta you need to use the flag
-ProQ:basename
Currently there are no functionality in Rosetta to account for models that have missing residues. These have to be handled outside, by first creating the sequence features for the full length sequence and than use the script 'copy_features_from_master.pl'. For the script to work you need to install the global alignment program 'needle' in the EMBOSS package (http://emboss.sourceforge.net/, or sudo apt-get install emboss)
The following example will create all sequence specifici features for the full.seq sequence, and than copy the relevant features to the model1-5.pdb, by aligning the sequences of model1-5.pdb to full.seq.
./run_all_external.pl -fasta full.seq ./copy_features_from_master.pl model1.pdb full.seq ./copy_features_from_master.pl model2.pdb full.seq ./copy_features_from_master.pl model3.pdb full.seq ./copy_features_from_master.pl model4.pdb full.seq ./copy_features_from_master.pl model5.pdb full.seq
RUNNING ProQ or ProQM on a pdb with the name 1abc.pdb
1) Create the sequence specific features as described above. You do this once for a given sequence. i.e. bin/run_all_external.pl -pdb 1abc.pdb, will create files with the basename 1abc.pdb For membrane proteins bin/run_all_external.pl -pdb 1abc.pdb -membrane 1
2a) to score all .pdb run:
score.linuxgccrelease -database
This will produce a scorefile with the a global score corresponding to
summed local scores. To get the usual ProQ/ProQM score divide by
length or run the scoring with -ProQ:normalize
To get the local scores run the scoring with
-ProQ:output_local_prediction, this will generate a local quality
estimates with the name ProQM.
2b) Side-chain sampling before score
USING PROQ2 (NON-MEMBRANE)
Generate 10 different side-chain conformations for each pdb:
relax.linuxgccrelease -database
Score each of these with ProQ2:
score.linuxgccrelease -database
Select the model with the highest ProQ2 score.
USING PROQM (MEMBRANE)
Generate 10 different side-chain conformations for each pdb:
relax.linuxgccrelease -database
Score each of these with ProQM:
score.linuxgccrelease -database
Select the model with the highest ProQM score.
2c) ProQM-resample Follow the instruction above to install ProQ_scripts
The command: resampling/resampling.pl <any pdb in the folder (this is only used to get the sequence)>
will do 1+2b (calculate sequence specific features+resampling+scoring) on all .pdb in the folder from where it is executed. All .pdb needs to be for the same sequence.