CATH-summer-2018 / SFAM-naming

Repository for materials that are related to superfamily naming
1 stars 2 forks source link

Mapping residues/proteins from Pfam data #1

Open nataliedawson opened 6 years ago

nataliedawson commented 6 years ago

Currently have the % of shared residues and % of shared proteins between CATH superfamilies and Pfam Clans, and CATH superfamilies and Pfam families.

One example of these results is for http://dawson_8080.cathdb.info/sfam/1.10.8.890, which shows that 100% of the residues in the CATH superfamily 1.10.8.890 are shared with Pfam family PF09618. There are however only 12.8% shared proteins, so this prompted a further look into the data to see whether the mapping was sufficient enough for the family name to be inherited into CATH.

Looking at the PDBe structural analysis page for the CATH superfamily domain rep it is clear that there is 100% overlap between the CATH domain and the Pfam domain. However the overlap score would be very bad as the Pfam domain is much larger than the CATH domain.

@jonglees Would it be possible to map in the reverse direction, therefore providing scores with respect to a Pfam family/clan to CATH superfamily mapping?

jonglees commented 6 years ago

First of all great website, the 3D view with domains selected is really useful.

If its OK I will have a go annotating a few of the families.

Like you say whenever only 12.8% are shared at the sequence level you cant inherit from the Pfam.

i.e.the CATH family is not equivalent so we can exclude the PFAM for direct inheritance

Having said that it could be useful as part of a manual approach to start to think about a name

I'll calculate that in the reverse direction and send that in case its useful.

have a good weekend

On Fri, Jun 22, 2018 at 3:29 PM, nataliedawson notifications@github.com wrote:

Currently have the % of shared residues and % of shared proteins between CATH superfamilies and Pfam Clans, and CATH superfamilies and Pfam families.

One example of these results is for http://dawson_8080.cathdb. info/sfam/1.10.8.890, which shows that 100% of the residues in the CATH superfamily 1.10.8.890 are shared with Pfam family PF09618 https://pfam.xfam.org/family/PF09618. There are however only 12.8% shared proteins, so this prompted a further look into the data to see whether the mapping was sufficient enough for the family name to be inherited into CATH.

Looking at the PDBe structural analysis page https://www.ebi.ac.uk/pdbe/entry/pdb/2xlj/analysis for the CATH superfamily domain rep it is clear that there is 100% overlap between the CATH domain and the Pfam domain. However the overlap score would be very bad as the Pfam domain is much larger than the CATH domain.

@jonglees https://github.com/jonglees Would it be possible to map in the reverse direction, therefore providing scores with respect to a Pfam family/clan to CATH superfamily mapping?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/CATH-summer-2018/SFAM-naming/issues/1, or mute the thread https://github.com/notifications/unsubscribe-auth/AG0RWH1ACWs4P3yu1hITU4AyNgf3jiFAks5t_P9kgaJpZM4Uz5Rw .

nataliedawson commented 6 years ago

Thanks Jon, that would be really useful to have the mapping scores in the reverse direction too. It's definitely ok for you to have a go at annotation too. Please do let me know if you have any issues.

Thanks!

nataliedawson commented 6 years ago

@jonglees Hi Jon, how easy would it be to regenerate all of the mapping data in your spreadsheet, with, for example, the latest version of InterPro?

jonglees commented 6 years ago

Yes will do that..but I'm away until next week so will do this start of next week.

On Thu, 28 Jun 2018 12:25 nataliedawson, notifications@github.com wrote:

@jonglees https://github.com/jonglees Hi Jon, how easy would it be to regenerate all of the mapping data in your spreadsheet, with, for example, the latest version of InterPro?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/CATH-summer-2018/SFAM-naming/issues/1#issuecomment-401002774, or mute the thread https://github.com/notifications/unsubscribe-auth/AG0RWFXa-5YHVEDIzXPpetKQSwSfc7cqks5uBL0NgaJpZM4Uz5Rw .

nataliedawson commented 6 years ago

@jonglees Thanks very much!