PHI-base / PHI5_web_display

PHI5_web_display will allow to display PHI-Canto data
1 stars 0 forks source link

Reassigning PHIG IDs when a UniProtKB accession changes #75

Open jseager7 opened 1 year ago

jseager7 commented 1 year ago

(References https://github.com/PHI-base/curation/issues/33)

In PHI-Canto, we recently had to change two UniProtKB accession numbers in an approved curation session. This will affect the PHI-base 5 gene pages since the old accession numbers had already been mapped to PHIG IDs.

PHIG ID Old AC New AC
PHIG:297 L7JC49 G4N1S3
PHIG:299 L7JGN0 G4MZY8

Since neither of the new accessions currently exist in PHI-base 5, presumably they will be given new PHIG IDs when the next export is loaded. For example:

PHIG ID New AC
PHIG:400 G4N1S3
PHIG:401 G4MZY8

The problem is what happens to the PHIG IDs linked to the old accessions, since these will no longer have any annotations. There are a few options here:

  1. keep the gene page with only the Entry Summary section visible,
  2. automatically redirect from the old gene page to the new gene page, or
  3. show a notice on the old gene page indicating that the current PHIG ID has been replaced with a newer one.

Options 2 and 3 help to explain why the information is no longer visible, but the problem is that if a later curation session adds information for L7JC49 or L7JGN0 that we do want to keep, then we may have to populate the gene pages for PHIG:297 and PHIG:299 again. I can think of two solutions for this:

  1. Add the newly curated data to PHIG:297 or PHIG:299, but keep a notice on the page stating that previous annotations were moved to another PHIG ID. Ideally, this notice would link to the new PHIG ID and include a date of when the move occurred (in case we have to do this more than once for one PHIG ID).

  2. Leave PHIG:297 or PHIG:299 as obsolete PHIG IDs, and add any newly curated data for L7JC49 or L7JGN0 to new PHIG IDs. For the obsolete gene pages, we will need to indicate them as such and we may need to keep the Entry Summary section visible so that users can find the current PHIG ID for the UniProtKB accession by using the search (obsolete PHIG IDs should be hidden from the search or given lower priority in the results).

I think Option 1 is the better solution, but will probably be much harder to achieve in practice. Unfortunately, Option 2 seems to defeat the purpose of having PHIG IDs: we intended to keep the PHIG ID stable while allowing the UniProtKB accession number to change over time. Obsoleting a PHIG ID due to a change in UniProtKB accession number makes it seem like there's no benefit in maintaining our own ID scheme, besides being able to track mistakes in curation.

We will also need to decide on a technical implementation of this process, because PHI-base 5 currently has no way to know from the export whether a UniProtKB accession number that has vanished from the export should not be given a new PHIG ID when it is entered again (and instead assigned to its former PHIG ID). The only solutions I can think of are:

  1. manually maintaining a mapping file that maps from a UniProtKB accession number to a PHIG ID, or
  2. having PHI-base 5 retain a mapping of the first UniProtKB accession number that was assigned to a PHIG ID.

The implementation of this mapping might depend depending on the options we choose above.