PHI-base / curation

PHI-base curation
0 stars 0 forks source link

Obsolete UniProt identifiers in existing PHI-base curation #52

Open jseager7 opened 5 years ago

jseager7 commented 5 years ago

We currently have many obsolete UniProt accessions in PHI-base, and we have no effective way to locate the corresponding extant entry for these accessions.

Currently, I think the best we can do is either BLAST the sequence of the obsolete accession (which is slow and probably unreliable); or try to find an active accession with the same gene name as the accession that was obsoleted (seems even less reliable, but you could use a sequence alignment tool to support the comparison).

@ValWood contacted UniProt about their policies for obsoleting accessions, and we were directed to a mapping that "maps old to new accession numbers via their protein_ids". I think from preliminary analysis there was no evidence that these new accessions mapped to any of our obsolete accessions, but I might have done the analysis wrong, so I'm planning to re-do this to make sure.

CuzickA commented 2 years ago

Hi @jseager7, can this ticket be closed now?

jseager7 commented 2 years ago

@CuzickA It should stay open if we still have obsolete UniProtKB accession numbers in PHI-base 4. That is, unless you want this tracker to only be for the new PHI-Canto curation.

If all of the obsolete identifiers in PHI-base 4 have been replaced or removed, then it's fine to close this issue.

CuzickA commented 2 years ago

Ahh I see, this query is for PHI-Base 4 data migration. I thought this tracker was just for new PHI-Canto curation.

We can keep it open and I'll add a new label 'PHI4 to PHI5 data migration'.

@martin2urban do you know whether we still have obsolete UniProtKB accession numbers in PHI-base 4? I remember that you did some work this with a colleague.

CuzickA commented 2 months ago

Hi @jseager7, has this been resolved in the data migration? Can we close the ticket now?

jseager7 commented 2 months ago

I'll check whether there are still obsolete UniProt IDs in PHI-base 4.18 before closing this issue. From memory I think that there are.