UCLOrengoGroup / cath-tools-seqscan

CATH: scan/align protein sequences against functional families
3 stars 0 forks source link

Biomap sequences #7

Closed dudimarcus closed 7 years ago

dudimarcus commented 8 years ago

Hi Ian, is there a way to decipher the heading for the biomap sequences results? I understand they might be uniprot sequences but there is only a long hash key in the header.

for example:

query/1-235 slalsltaDQMVSALLDAEPPILYSE------FSEASMMGLLTNLADRELVHMINWAKRV PGFVDLTLHDQVHLLECAWLEILMIGLVWRSMEHPGKLLFAPNLLLDRNQGKCVEGMVEI FDMLLATSSRFRMMNLQGEEFVCLKSIILLNSGVYTF---TLKSLEEKDHIHRVLDKITD TLIHLMAKAGLTLQQQHQRLAQLLLILSHIRHMSNKGMEHLYSMKCKNVVPLYDLLLEML DAHR--- biomap|4.1.0|48b22b698179926749cc88efdcd1cd83/67-305 ........DQMVSALLEAEPPVVYSEYDPSRPFNEASVMTLLTNLADRELVHMINWAKRV PGFVDLALHDQVHLLECAWLEILMVGLVWRSMEHPGKLLFAPNLLLDRSHGKVVEGFVEI FDMLLAASSRFRMMNVRGEEFVCLKSIILLNPGIYTYLSSTLKSVEERDHIHRVLDKITD TLMHLMAKSGLSLQQQHRRLAQLLLILSHIRHMSNKGMEHLYSMKCKNVVPLYDLLLEML DAHRLHA

sillitoe commented 8 years ago

I'll add in some meta tags in the description for each sequence header - if we can map the sequence domain to a UniProtKB accession then it will appear in the headers.

Worth noting that there is not a one-to-one relationship between sequence domain id ($md5/$start-$stop) and UniProtKB accession - each sequence domain id can have 0, 1 or more UniProtKB accessions.

Each entry corresponds to a predicted structural domain (i.e. region) of a unique protein sequence in Gene3D. Some of these ids will map to a protein sequence in UniProtKB. Some of them will map to protein sequences from Ensembl. Some of them may map to more than one UniProtKB entry (e.g. two proteins with the same sequence coming from different organisms).

dudimarcus commented 7 years ago

Many thanks!

If there’s an easy way to add ids including those from Ensembl it would be great!

On 21 Oct 2016, at 14:36, Ian Sillitoe notifications@github.com wrote:

I'll add in some meta tags in the description for each sequence header - if we can map the sequence domain to a UniProtKB accession then it will appear in the headers.

Worth noting that there is not a one-to-one relationship between sequence domain id ($md5/$start-$stop) and UniProtKB accession (each sequence domain id can have 0, 1 or more UniProtKB accessions).

Each entry corresponds to a predicted structural domain (i.e. region) of a unique protein sequence in Gene3D. Some of these ids will map to a protein sequence in UniProtKB. Some of them will map to protein sequences from Ensembl. Some of them may map to more than one UniProtKB entry (e.g. two proteins with the same sequence coming from different organisms).

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/sillitoe/cath-tools-seqscan/issues/7#issuecomment-255378560, or mute the thread https://github.com/notifications/unsubscribe-auth/AFmVffm8OtqTZD3_rdRx5OdN_5Ozpq_gks5q2L_lgaJpZM4KdNcG.

sillitoe commented 7 years ago

Should be solved by 0ad762567b7e08b503dc3d30d20144c75348aae1

Resulting alignments have: