cmap / cmapPy

Assorted tools for interacting with .gct, .gctx files and other Connectivity Map (Broad Institute) data/tools
https://clue.io/cmapPy/index.html
BSD 3-Clause "New" or "Revised" License
126 stars 76 forks source link

Significant speed up in pandasGEXpress #42

Closed TyberiusPrime closed 6 years ago

TyberiusPrime commented 6 years ago

Calling pandasGEXpress with any number of cids is very slow - my machine takes about 24s for 1000 cids.

This is due to an unfortunate lookup method in get_ordered_idx(id_type, id_list, meta_df). In essence, for each id the meta_df.index get's converted into a list, and then .index is called upon it.

This PR replaces it with a dictionary based lookup, and I can load a 1000 cids in 1s und 10,000 in 2s.

oena commented 6 years ago

Thanks!