choderalab / ensembler

Automated omics-scale protein modeling and simulation setup.
http://ensembler.readthedocs.io/
GNU General Public License v2.0
52 stars 21 forks source link

A note on reproducibility - changes in Uniprot IDs #84

Open rafwiewiora opened 8 years ago

rafwiewiora commented 8 years ago

A word of caution about using Uniprot IDs.

I have come back to my Ensembler set-up from a few months ago, to find that:

ensembler gather_targets --gather_from uniprot --query SETD8_HUMAN --uniprot_domain_regex SET
ensembler gather_templates --gather_from uniprot --query SETD8_HUMAN

no longer work. It's already not reproducible!

Uniprot appears to have changed the ID from SETD8_HUMAN to KMT5A_HUMAN. I've now learnt that I should be using the accession number (AC), not IDs - ACs never change (http://www.uniprot.org/help/difference_accession_entryname).

jchodera commented 8 years ago

Yikes!

Does ensembler offer a way to retrieve by accession number?

For reproducibility, we can still let you search by uniprot ID but it could generate an input file where these are translated into accession numbers.

danielparton commented 8 years ago

Yep, definitely use accession numbers if you need to use persistent IDs. The --query argument is passed directly to the UniProt search API, which allows you to search by accession number as follows: --query 'accession:Q9NQR1' http://ensembler.readthedocs.io/en/latest/examples.html#example-using-the-main-pipeline-functions

On Mon, Jun 27, 2016 at 8:35 PM, John Chodera notifications@github.com wrote:

Yikes!

Does ensembler offer a way to retrieve by accession number?

For reproducibility, we can still let you search by uniprot ID but it could generate an input file where these are translated into accession numbers.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/choderalab/ensembler/issues/84#issuecomment-228916489, or mute the thread https://github.com/notifications/unsubscribe/AEBcWpQ0kPIQqYJmguNeL7Z6c5hf-rp2ks5qQGxVgaJpZM4I_ooD .

rafwiewiora commented 8 years ago

Thanks Danny, exactly what I needed!