biothings / mygeneset.info

Apache License 2.0
5 stars 3 forks source link

Improve msigdb #52

Closed ravila4 closed 1 year ago

ravila4 commented 1 year ago

Fix outstanding issues with parser and geneset utilites. geneset_utils.py has a new function get_results() that simplifies a lot of boilerplate code in the parser. Also, this module adds new top-level fields to genesets duplicates, not_found, count, as well as source_id under the genes field.

Query speedups were achieved by reversing the order of search/retry lists. By preferring gene symbols, we are able to retrieve most genes with a single query, versus using the original ids. Fixes bugs in the parser that were causing some genes to be missed or queried incorrectly.