UCLOrengoGroup / cath-todo

Issues relating to the CATH protein structure classification web pages
4 stars 0 forks source link

cath domain: error when superfamily exists in 'current', but not 'latest' #37

Closed sillitoe closed 5 years ago

sillitoe commented 5 years ago

http://www.cathdb.info/version/latest/domain/5hk7B00

sillitoe commented 5 years ago

The plot thickens...

This domain was assigned to superfamily '1.10.287.3890', however this superfamily was made inactive during the v4.2 release since it was labelled as one of the 'bin' superfamilies (the postgresql database and release files reflects this change and are consistent).

However, the Oracle database (which was built from v4.2) has domain '5hk7B00' assigned to '1.10.287.3890' in both gene3d_16 and cath_v4_2_0. My best guess is that the 'bin' superfamilies were pulled during the release process and the domain table of Gene3D wasn't refreshed.

These 'bin' superfamilies and domains do not appear in any of the official release data files for CATH v4.2, so I think it's reasonable to correct this simply by deleted these domain assignments from the Oracle database:

TODO:

cath_v4_2_0.domain
gene3d_16.cath_list
sillitoe commented 5 years ago

bin_domains_to_unassign.txt

sillitoe commented 5 years ago

these changes have been made to the database

added a notice to the domain page to explain the difference between CATH-B and CATH+

http://www.cathdb.info/version/latest/domain/5hk7B00

closing ticket.