Open jchodera opened 9 years ago
Sure, shouldn't be too much work. My plan is to implement a "project_settings.yaml" file to contain this sort of info. This would go in the project top-level directory, and would be created by ensembler init
. It would also be a good way to specify pH, since the default is 7.0 and we will want to use 8.0 for many of our projects. I'm thinking of the following format:
all_targets:
pH: 8.0
per_target:
DDR1_HUMAN_D0_PROTONATED:
custom_residue_variants:
# keyed by 0-based residue index
35: ASH
CSK21_HUMAN_D0:
# this value would take priority over the value in "all_targets" (just for illustration)
pH: 6.5
Nice!
Any chance we could us Uniprot residue numbering instead? Or is that just a pain?
What if the target is not a standard UniProt sequence?
How are nonstandard sequences specified now? FASTA file? Is there some provision for numbering in those? Or must they be zero indexed?
Yes currently nonstandard sequences would have to be added manually in a FASTA file, which does not allow for residue numbering.
There is a script for outputting topologies with residues numbers according to UniProt (the targetid must match a UniProt entry name) - it would be simple to modify this to accept a custom numbering scheme. If you just wanted to ensure these residue numbers are used for F@h projects, this might be the quickest/simplest approach.
Is there a standard besides FASTA?
I wonder if a SEQRES block (with the additional DBREF information to get numbering) would be more general.
Both the UniProt and FASTA sequence ingestion schemes could in principle generate this format, and it would be easier for users to modify both the numbering and three letter residue codes.
Ok, could you write this as a separate issue?
Ok, could you write this as a separate issue?
Done! #39
@sonyahanson would like to select the protonation states of some residues (such as the DFG Asp in kinases).
Currently, the code uses the highest sequence identity PDB file
model.pdb.gz
to extract reference residue variants that are used to assign the protonation state variants of all residues in all models.To manually make alterations, this would mean editing the
model.pdb.gz
of the highest sequence identity model(s) manually after the modelling stage in order to change residue names appropriately (e.g. changeASP
toASH
to select the protonated aspartic acid).In future, it would be useful to allow these alterations to be specified either via the command-line or a YAML file or something.