NaegleLab / CoDIAC

GNU General Public License v3.0
0 stars 0 forks source link

Lower Order Species UniProt ID Issue #50

Closed adshimpi closed 1 month ago

adshimpi commented 1 month ago

Description

For some lower species there is a new issue associated with fetching UniProt IDs for proteins containing a specific Interpro ID. It appears that this comes from some very special cases where there are specific strains for the species itself. An example is with the filasterea: Capsaspora owczarzaki. Within UniProt/Interpro its scientific name is used to find records, but within the actual UniProt entries the strain name is used instead leading to no entries being fetched. In previous versions of the Interpro module the lack of entries could be circumvented using the second uniprot_dict value returned. However, these are now being returned as empty, default values (second screenshot).

Screenshots

Message regarding how many records were fetched

image

The returned uniprot_dict value for this specific species

image

Files

This is only within the Uniprot.py file and specific to the fetch_uniprotids function

To Reproduce

Steps to reproduce the behavior: species = 'Capsaspora owczarzaki' uniprot_IDs, uniprot_dict = CoDIAC.InterPro.fetch_uniprotids(Interpro_ID, REVIEWED=False, species=species)

Expected behavior

The uniprot_dict value should still contain strain information.

Tasks

Include specific tasks in the order they need to be done in. Include links to specific lines of code where the task should happen at, if known