cytoscape / RCy3

New version of RCy3, redesigned and collaboratively maintained by Cytoscape developer community
MIT License
48 stars 20 forks source link

Optimization of RCy3 pipelines - avoid using network name or network suid in pipelines when creating many networks #189

Closed risserlin closed 1 year ago

risserlin commented 1 year ago

If your pipeline creates many networks and performs analyses on each of the networks it creates, avoid specifying the network name parameter to the RCy3 function used as that will slow down the code significantly.

I had preferred to specify the network name to make sure that the command was being executed on the correct network but as the number of networks grew the time to process each network also grew. Swapping out network SUID for network name didn't improve performance.

Not sure if there is a way to improve this from RCy3 but just wanted to put this out there is anyone else has encountered this issue.

AlexanderPico commented 1 year ago

The analyzeNetwork function does not have a network parameter. Or maybe you were calling the CyCommand directly using, commandsRun? Hmm... I wonder if I left this param out on purpose because of this :)

https://github.com/cytoscape/RCy3/blob/master/R/Tools.R#L5

In any case, thanks for reporting! I'll file this as a CyCommands bug in the Jira issue tracker.

risserlin commented 1 year ago

I was using if for a bunch of functions - selectEdges, getSelectedNodes, getSelectedEdges, getTableColumns. Sorry, should have specified.

It was the difference of doing < 100 networks in hours and doing all 300 networks in less than 30 minutes.

AlexanderPico commented 1 year ago

This was an interesting one. In all of these functions (all RCy3 functions that take a network param, in fact), if you do NOT provide a network name or SUID, then the function simply uses the keyword "current" to look up an SUID. If you provide a name or SUID, then it checks it against all the network in your current session to verify that it exists (to avoid downstream NULL errors). The check for network names was using some old old logic, predating our overhaul of RCy3! It was getting a list of SUIDs and then looking each one up, one-by-one, to generate a list of names to check against. Very slow for hundreds of networks :)

I've replaced this old logic with a simple, single lookup. Do you mind trying the github version of the package to see if it's as fast WITH network names as without?

AlexanderPico commented 1 year ago

@yihangx FYI: I checked py4cytoscape and it is using more efficient logic already. No need to make changes there.

risserlin commented 1 year ago

Wow! So much faster! I tried out the pipeline with the latest release using network names and it is moving so so so much faster. Thanks.