Open jseager7 opened 5 years ago
Hi @jseager7, can this ticket be closed now?
@CuzickA It should stay open if we still have obsolete UniProtKB accession numbers in PHI-base 4. That is, unless you want this tracker to only be for the new PHI-Canto curation.
If all of the obsolete identifiers in PHI-base 4 have been replaced or removed, then it's fine to close this issue.
Ahh I see, this query is for PHI-Base 4 data migration. I thought this tracker was just for new PHI-Canto curation.
We can keep it open and I'll add a new label 'PHI4 to PHI5 data migration'.
@martin2urban do you know whether we still have obsolete UniProtKB accession numbers in PHI-base 4? I remember that you did some work this with a colleague.
Hi @jseager7, has this been resolved in the data migration? Can we close the ticket now?
I'll check whether there are still obsolete UniProt IDs in PHI-base 4.18 before closing this issue. From memory I think that there are.
We currently have many obsolete UniProt accessions in PHI-base, and we have no effective way to locate the corresponding extant entry for these accessions.
Currently, I think the best we can do is either BLAST the sequence of the obsolete accession (which is slow and probably unreliable); or try to find an active accession with the same gene name as the accession that was obsoleted (seems even less reliable, but you could use a sequence alignment tool to support the comparison).
@ValWood contacted UniProt about their policies for obsoleting accessions, and we were directed to a mapping that "maps old to new accession numbers via their protein_ids". I think from preliminary analysis there was no evidence that these new accessions mapped to any of our obsolete accessions, but I might have done the analysis wrong, so I'm planning to re-do this to make sure.