cokelaer / bioservices

Access to Biological Web Services from Python.
http://bioservices.readthedocs.io
Other
278 stars 60 forks source link

Include Uniprot secondary accessions in DataFrame columns #237

Closed joaquinabian closed 1 year ago

joaquinabian commented 1 year ago

Currently, only the primary accession field of a protein query in Uniprot is included in the 110 columns of the DataFrame returned by get_df(). The new Uniprot API now list the secondary accessions under the key sec_acc.

This would be very helpful to align and compare proteomics results from different sources or experiments.

cokelaer commented 1 year ago

@joaquinabian it took me a while to understand what was going on here. Indeed the sec_acc is reported in the uniprot documentation page but is not part of the 110ish columns. Instead, it could (and should) be used in the query. In notice a few cases like that. I thought it was error (missing fields) but it looks like this is a design choice from EBI/uniprot. So, you have to type:

u.get_df('sec_acc:P62988)

to get secondary accession instead of just:

u.get_df('P62988)

I have updated the uniprot doc in bioservices