original regex only recognised SwissProt format, and use of string
truncation to identify variants could lead to unwanted behavior,
in which TrEMBL accessions were converted into legitimate (but
different) SwissProt accessions.
This commit therefore both updated the regex to recognise TrEMBL
strings, and created a new function to correctly parse the isoform
variant number from any uniprot accession, replacing the old
truncation function (ie, no more seqid[:6] ).
added function clean_uniprot() to this end.
also added clean_uniprot_list(), which allows taking a list of
seqids and returning the appropriate accession list.
original regex only recognised SwissProt format, and use of string truncation to identify variants could lead to unwanted behavior, in which TrEMBL accessions were converted into legitimate (but different) SwissProt accessions.
This commit therefore both updated the regex to recognise TrEMBL strings, and created a new function to correctly parse the isoform variant number from any uniprot accession, replacing the old truncation function (ie, no more seqid[:6] ).
added function clean_uniprot() to this end.
also added clean_uniprot_list(), which allows taking a list of seqids and returning the appropriate accession list.