Closed rafwiewiora closed 8 years ago
Can you reformat this as a short, succinct statement of the suggested improvement(s)?
Of course, sorry!
Action I wanted: Look at Uniprot code: Q9NQR1 and model the SET domain using all PDB structures available under that code.
Required input:
ensembler quickmodel --target_uniprot_entry_name "SETD8_HUMAN" --template_uniprot_query "accession:q9nqr1" --no-loopmodel --uniprot_domain_regex "SET"
Redundancies: --target_uniprot_entry_name "SETD8_HUMAN"
and --template_uniprot_query "accession:q9nqr1"
- same thing specified twice, in two different forms - also no availability to just pass the Q9NQR1 code - why have to generate the accession:q9nqr1
string (I never remember the word is accession...)
Proposal: replace the redundancy with a single, just Uniprot code containing, argument:
--target_template_uniprot_code Q9NQR1
so the full call would be for me:
ensembler quickmodel --target_template_uniprot_code Q9NQR1--no-loopmodel --uniprot_domain_regex "SET"
Further proposal would be to get rid of having to mention the domain, and just do 'as much modeling on the given sequence from Uniprot code as possible' - i.e. just cut the sequence to the biggest coverage from PDB available. This must be done somewhere internally already or Modeller does it - but perhaps this might lead to problems because I've only tried one use case here.
Quickmodel is designed to model a single target protein with a small number of template structures. But the template protein doesn't have to be the same as the target protein. Hence there are separate flags to define target and template. If you want to use the same protein as target and template, then there will be redundancy in the command. To avoid this, you could implement the following in the CLI (< https://github.com/choderalab/ensembler/blob/master/ensembler/cli_commands/quickmodel.py
):
- If no template-defining flags are entered (e.g.
--template_uniprot_query
), have quickmodel default to using the protein defined with--target_uniprot_entry_name
Regarding accession codes, I was trying to allow the user to use UniProt
mnemonics (e.g. "SETD8_HUMAN") where possible, rather than accession code
(e.g. Q9NQR1), since it is more human-readable. Both are unique identifiers
in UniProt. If you definitely want to be able to use accession code, you
could perhaps add a --target_uniprot_ac
flag (where ac
means accession
code).
The docstring for this command (ensembler quickmodel -h
) does contain
examples for each of the flags. But there are a lot of available flags, so
could be good to add further documentation, e.g. commented examples for
each use case.
On Mon, Apr 4, 2016 at 2:19 PM, Rafal Wiewiora notifications@github.com wrote:
Of course, sorry!
Action I wanted: Look at Uniprot code: Q9NQR1 and model the SET domain using all PDB structures available under that code.
Required input:
ensembler quickmodel --target_uniprot_entry_name "SETD8_HUMAN" --template_uniprot_query "accession:q9nqr1" --no-loopmodel --uniprot_domain_regex "SET"
Redundancies: --target_uniprot_entry_name "SETD8_HUMAN" and --template_uniprot_query "accession:q9nqr1" - same thing specified twice, in two different forms - also no availability to just pass the Q9NQR1 code - why have to generate the accession:q9nqr1 string (I never remember the word is accession...)
Proposal: replace the redundancy with a single, just Uniprot code containing, argument:
--target_template_uniprot_code Q9NQR1
so the full call would be for me:
ensembler quickmodel --target_template_uniprot_code Q9NQR1--no-loopmodel --uniprot_domain_regex "SET"
Further proposal would be to get rid of having to mention the domain, and just do 'as much modeling on the given sequence from Uniprot code as possible' - i.e. just cut the sequence to the biggest coverage from PDB available. This must be done somewhere internally already or Modeller does it - but perhaps this might lead to problems because I've only tried one use case here.
— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/choderalab/ensembler/issues/70#issuecomment-205430855
Thanks @danielparton !
I think what actually bothered me more was that the target specification and template specifications are done in different ways (name vs search query) - so I think a great solution might be to just add a --template_uniprot_entry_name' argument so you can call:
ensembler quickmodel --target_uniprot_entry_name SETD8_HUMAN --template_uniprot_entry_name SETD8_HUMAN`
But aha! I can just pass SETD8_HUMAN as --template_uniprot_query
.
Same with the code - I can see the reasoning now!
I think the only thing missing is just better docs - because you only wrote one comment as it's much clearer to me now! So I'll write more docstrings as I explore the code more!
Sure thing. Yeah, documentation could definitely be improved. Probably good for someone else to give that try - I probably know the code too well... But feel free to ask if you have questions
On Mon, Apr 4, 2016 at 4:23 PM, Rafal Wiewiora notifications@github.com wrote:
Thanks @danielparton https://github.com/danielparton !
I think what actually bothered me more was that the target specification and template specifications are done in different ways (name vs search query) - so I think a great solution might be to just add a --template_uniprot_entry_name' argument so you can call: ensembler quickmodel --target_uniprot_entry_name SETD8_HUMAN --template_uniprot_entry_name SETD8_HUMAN`
But aha! I can just pass SETD8_HUMAN as --template_uniprot_query.
Same with the code - I can see the reasoning now!
I think the only thing missing is just better docs - because you only wrote one comment as it's much clearer to me now! So I'll write more docstrings as I explore the code more!
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/choderalab/ensembler/issues/70#issuecomment-205478726
Ok cool, will do! Thanks Danny!
@jchodera
Pasting from Slack: