NaegleLab / CoDIAC

Other
0 stars 0 forks source link

Interpro updates #53

Closed knaegle closed 4 months ago

knaegle commented 4 months ago

Newest functionality and behavior for Interpro. This no longer uses hierarchy and it only queries InterPro. It uses the metadata field to pull only from the domain listing (i.e. not superfamilies) and it assumes behavior that is consistent for InterPro in returning the top families of interest first. This grabs domains listed first, adding domains as long as they have no (or less than 50% as default) overlap in boundaries with existing, accepted domains. This behavior mimics the top track views of InterPro. Behavior: domain_dict, domain_string_dict, domain_arch_dict = get_domains(uniprot_IDs) The domain_dict is a dictionary, keys are the uniprot ID, and the values are a list of the dictionaries with domain information. The string_dict places key information into list of strings and the arch_dict is just the domain short name separated by |.